Impact of Education Level on Mental Health during COVID

Appendix to report

Data cleaning

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyverse)
── Attaching packages
───────────────────────────────────────
tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ purrr   1.0.0
✔ tibble  3.2.1     ✔ stringr 1.5.0
✔ tidyr   1.2.1     ✔ forcats 0.5.2
✔ readr   2.1.3     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(lubridate)
Loading required package: timechange

Attaching package: 'lubridate'

The following objects are masked from 'package:base':

    date, intersect, setdiff, union
mental_health <- read_csv("data/Mental_Health_Care_in_the_Last_4_Weeks.csv")
Rows: 10404 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): Indicator, Group, State, Subgroup, Phase, Time Period Label, Time ...
dbl  (5): Time Period, Value, LowCI, HighCI, Suppression Flag

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mental_health_clean <- mental_health |>
  filter(Group == "By Education") |>
  select(-"Suppression Flag", -"Quartile Range", -"Group", -"State", -"Phase", 
         -"Confidence Interval", -"Time Period Label")

mental_health_clean$"Time Period Start Date" <- mdy(mental_health_clean$"Time Period Start Date")
mental_health_clean$"Time Period End Date" <- mdy(mental_health_clean$"Time Period End Date")

  colnames(mental_health_clean)[colnames(mental_health_clean) == "Subgroup"] <- "Education Level"

mental_health_clean
# A tibble: 608 × 8
   Indicator              `Education Level` `Time Period` Time Period Start Da…¹
   <chr>                  <chr>                     <dbl> <date>                
 1 Took Prescription Med… Less than a high…            13 2020-08-19            
 2 Took Prescription Med… High school dipl…            13 2020-08-19            
 3 Took Prescription Med… Some college/Ass…            13 2020-08-19            
 4 Took Prescription Med… Bachelor's degre…            13 2020-08-19            
 5 Received Counseling o… Less than a high…            13 2020-08-19            
 6 Received Counseling o… High school dipl…            13 2020-08-19            
 7 Received Counseling o… Some college/Ass…            13 2020-08-19            
 8 Received Counseling o… Bachelor's degre…            13 2020-08-19            
 9 Took Prescription Med… Less than a high…            13 2020-08-19            
10 Took Prescription Med… High school dipl…            13 2020-08-19            
# ℹ 598 more rows
# ℹ abbreviated name: ¹​`Time Period Start Date`
# ℹ 4 more variables: `Time Period End Date` <date>, Value <dbl>, LowCI <dbl>,
#   HighCI <dbl>
  • Downloaded data from Data.gov

  • Put the data from the csv file into a tibble using read_csv

  • Filtered out the data that did not indicate a person’s education level

  • Selected only relevant columns to our research question

  • Changed the start and end date columns to be date variables instead of character variables

  • Changed the name of the column indicating education from “Subgroup” to “Education Level”

Other appendicies (as necessary)