The Opt-Out Movement: Analyzing the Underlying Factors Influencing Students’ Decision to Skip New York State Tests

Appendix to report

Data collection and cleaning

library(tidyverse)
library(janitor)
library(dplyr)
library(readxl)

G3_8_ELA_REFUSALS <- read_excel(
  "data/3-8-ELA-MATH-REFUSALS (1).xlsx", 
    sheet = "ELA", skip = 1) |>
  rename(total_count_all = TOTAL_COUNT...5) |>
  rename(pct_refused_all = "%_REFUSED...6") |>
  rename(total_count_ELL = TOTAL_COUNT...7) |>
  rename(pct_refused_ELL = "%_REFUSED...8") |>
  rename(total_count_SWD = TOTAL_COUNT...9) |>
  rename(pct_refused_SWD = "%_REFUSED...10") |>
  rename(total_count_ED = TOTAL_COUNT...11) |>
  rename(pct_refused_ED = "%_REFUSED...12") |>
  rename(school_ID = INSTITUTION_ID) |>
  rename(entity_CD = ENTITY_CD) |>
  rename(district_name = ENTITY_NAME) |>
  rename(subject = SUBJECT)
  G3_8_ELA_REFUSALS[is.na(G3_8_ELA_REFUSALS)] <- 0
  
G3_8_MATH_REFUSALS <- read_excel(
  "data/3-8-ELA-MATH-REFUSALS (1).xlsx", 
    sheet = "MATH", skip = 1) |>
  rename(total_count_all = TOTAL_COUNT...5) |>
  rename(pct_refused_all = "%_REFUSED...6") |>
  rename(total_count_ELL = TOTAL_COUNT...7) |>
  rename(pct_refused_ELL = "%_REFUSED...8") |>
  rename(total_count_SWD = TOTAL_COUNT...9) |>
  rename(pct_refused_SWD = "%_REFUSED...10") |>
  rename(total_count_ED = TOTAL_COUNT...11) |>
  rename(pct_refused_ED = "%_REFUSED...12") |>
  rename(school_ID = INSTITUTION_ID) |>
  rename(entity_CD = ENTITY_CD) |>
  rename(district_name = ENTITY_NAME) |>
  rename(subject = SUBJECT)
  G3_8_MATH_REFUSALS[is.na(G3_8_MATH_REFUSALS)] <- 0

#output of the dataset written as a .csv file
write_csv(G3_8_ELA_REFUSALS, file = "data/ELA-refusals.csv")
write_csv(G3_8_MATH_REFUSALS, file = "data/Math-refusals.csv")

We first downloaded the data file from https://data.nysed.gov/downloads.php and selected the Districts, Charters Grades 3-8 ELA and Math Refusals dataset from the 2021-2022 academic year. Once this Excel file was downloaded, it was imported into R using the read_excel() function from the readxl package. We first worked with the ELA data which was on the ELA page of the Excel sheet. We selected the sheet and skipped the first row since it did not contain column names (column names were on the second row) nor any data. We named this data G3_8_ELA_REFUSALS. After that, we renamed the column names to the sections that they corresponded to on the Excel sheet. We relabeled columns 5 and 6 to their corresponding values on the Excel sheet: all students (denoted as “all” on the new, edited sheet). We then wanted to make sure our data analysis and visualizations included all of the observations, so we made sure to change the NA values to 0; this way they are included in calculations. We performed the same steps with the other sections, changing the names of columns 7 and 8 to reflect their values on the Excel sheet (English Language Learners, denoted as ELL), columns 9 and 10 to reflect their values (Students with Disabilities, denoted as SWD), and columns 11 and 12 to reflect their values (Economically Disadvantaged students, denoted as ED). Finally, we made the other variables more intuitive to us, making the variable names lower case, and renaming INSTITUTION_ID to school_ID, and ENTITY_NAME to district_name. These steps were then repeated with the Math opt-out data, and the data was named G3_8_MATH_REFUSALS. Finally, this data was all written as two .csv files using the write_csv() function, one which contained ELA data, and the other which contained Math data. They are named ELA_Refusals and Math_Refusals respectively. We stored these csv files in the data folder in the project repository.