Impact of Education Level on Mental Health during COVID

Appendix to report

Data cleaning

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(tidyverse)

── Attaching packages
───────────────────────────────────────
tidyverse 1.3.2 ──

✔ ggplot2 3.4.0     ✔ purrr   1.0.0
✔ tibble  3.2.1     ✔ stringr 1.5.0
✔ tidyr   1.2.1     ✔ forcats 0.5.2
✔ readr   2.1.3     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

library(lubridate)

Loading required package: timechange

Attaching package: 'lubridate'

The following objects are masked from 'package:base':

    date, intersect, setdiff, union

mental_health <- read_csv("data/Mental_Health_Care_in_the_Last_4_Weeks.csv")

Rows: 10404 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): Indicator, Group, State, Subgroup, Phase, Time Period Label, Time ...
dbl  (5): Time Period, Value, LowCI, HighCI, Suppression Flag

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

mental_health_clean <- mental_health |>
  filter(Group == "By Education") |>
  select(-"Suppression Flag", -"Quartile Range", -"Group", -"State", -"Phase", 
         -"Confidence Interval", -"Time Period Label")

mental_health_clean$"Time Period Start Date" <- mdy(mental_health_clean$"Time Period Start Date")
mental_health_clean$"Time Period End Date" <- mdy(mental_health_clean$"Time Period End Date")

  colnames(mental_health_clean)[colnames(mental_health_clean) == "Subgroup"] <- "Education Level"

mental_health_clean

# A tibble: 608 × 8
   Indicator              `Education Level` `Time Period` Time Period Start Da…¹
   <chr>                  <chr>                     <dbl> <date>                
 1 Took Prescription Med… Less than a high…            13 2020-08-19            
 2 Took Prescription Med… High school dipl…            13 2020-08-19            
 3 Took Prescription Med… Some college/Ass…            13 2020-08-19            
 4 Took Prescription Med… Bachelor's degre…            13 2020-08-19            
 5 Received Counseling o… Less than a high…            13 2020-08-19            
 6 Received Counseling o… High school dipl…            13 2020-08-19            
 7 Received Counseling o… Some college/Ass…            13 2020-08-19            
 8 Received Counseling o… Bachelor's degre…            13 2020-08-19            
 9 Took Prescription Med… Less than a high…            13 2020-08-19            
10 Took Prescription Med… High school dipl…            13 2020-08-19            
# ℹ 598 more rows
# ℹ abbreviated name: ¹`Time Period Start Date`
# ℹ 4 more variables: `Time Period End Date` <date>, Value <dbl>, LowCI <dbl>,
#   HighCI <dbl>

Downloaded data from Data.gov
Put the data from the csv file into a tibble using read_csv
Filtered out the data that did not indicate a person’s education level
Selected only relevant columns to our research question
Changed the start and end date columns to be date variables instead of character variables
Changed the name of the column indicating education from “Subgroup” to “Education Level”

Other appendicies (as necessary)