Covid Policies Data Across the United States

Proposal

Data 1 - Pediatric Data

Introduction and data

  • Identify the source of the data.

This data was found in healthdata.gov. The specific link is https://healthdata.gov/Hospital/Pediatric-COVID-19-Hospitalizations-by-State/n5sm-z9rn.

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

This data was collected from January of 2020 until November of 2022. The data was collected from three main sources: HHS TeleTracking, reporting provided directly to HHS Protect by state/territorial health departments on behalf of their healthcare facilities, and National Healthcare Safety Network.

  • Write a brief description of the observations.

The observations (focusing on May 2022 - November 2022) shows that the amount of pediatric beds in use across all states continuously toggles up and down, but has a slight general trend upward in this time period. The data set also includes observations about the critical staffing shortages and whether they are anticipated or occurring currently.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

How does the percentage of critical staffing shortages change over time?

  • A description of the research topic along with a concise statement of your hypotheses on this topic.

The research topic for this question is the state of the healthcare industry and the effect of the Covid-19 pandemic on said state. We expect that the percentage of staffing shortages will have increased over time because the Covid-19 pandemic caused a decrease in number of healthcare workers and an increase in the number of hospitalizations.

  • Identify the types of variables in your research question. Categorical? Quantitative?

This question looks at two different variables date which is a qualitative variable, and then the percentage of critical_staffing_shortages which is quantitative. We will likely look at the variables titled critical_staffing_shortage_today_yes, critical_staffing_shortage_today_no, and critical_staffing_shortage_today_not_reported to determine the percentage of staffing shortages at that time.

Glimpse of data

# A tibble: 6 × 135
  state date       critical_staffing_shortage_today_yes critical_staffing_shor…¹
  <chr> <date>                                    <dbl>                    <dbl>
1 MA    2021-02-23                                    9                       69
2 SD    2021-02-18                                    2                       60
3 RI    2021-02-13                                    4                       10
4 DC    2021-02-05                                    0                       12
5 MA    2021-02-02                                    8                       70
6 KS    2021-01-26                                   10                      128
# ℹ abbreviated name: ¹​critical_staffing_shortage_today_no
# ℹ 131 more variables: critical_staffing_shortage_today_not_reported <dbl>,
#   critical_staffing_shortage_anticipated_within_week_yes <dbl>,
#   critical_staffing_shortage_anticipated_within_week_no <dbl>,
#   critical_staffing_shortage_anticipated_within_week_not_reported <dbl>,
#   hospital_onset_covid <dbl>, hospital_onset_covid_coverage <dbl>,
#   inpatient_beds <dbl>, inpatient_beds_coverage <dbl>, …

Data 2 - COVID-19 Data

Introduction and data

  • Identify the source of the data.

This data was found in healthdata.gov. The link to the data can be found here:https://healthdata.gov/dataset/COVID-19-State-and-County-Policy-Orders/gyqz-9u7n

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

The data was collected by the U.S Department of Health and Human Services and transferred over to the HHS Office of the Chief Data Officer. The data comes from the BU COVID-19 State Policy Database and “Stay at Home Policies” from wikidata. The data was also curated manually by Virtual Student Federal Service Interns.

  • Write a brief description of the observations.

This dataset contains 11 columns and 4,218 rows. Each of the observations (rows) represents a different COVID-19 policy. Every observation is repeated twice since the dataset contains the start and end of every state and county COVID policy from March 2020 to December 2022

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

Is there a correlation between a region’s political stance and the number of covid policies it has implemented?

Do different regions lean towards different policies?

Do certain regions tend towards state or county policies?

  • A description of the research topic along with a concise statement of your hypotheses on this topic.

The research topic focuses on monthly averages of hospital staffing shortages which can be grouped by major US geographic regions (ex: West Coast vs. East Coast) which would then help to answer the research question. My hypothesis is that certain regions, like the Southern US, are more likely to have hospital staffing shortages than other regions of the US.

  • Identify the types of variables in your research question. Categorical? Quantitative?

The columns of interest for our analysis are state_id which denotes which state a policy occurred in, county which identifies the country that a policy occurred in, policy_level which denotes whether a policy was implemented at a state or county level, date which is the date a policy was implemented on, policy_type which describes details of the COVID policy, start_stop which indicates if the row is the start or end of a policy, comments which are additional comments about the policy, and total_phases which shows how many times the policy was reimplemented or changed.

Glimpse of data

# A tibble: 6 × 11
  state_id county    fips_code policy_level date       policy_type    start_stop
  <chr>    <chr>     <chr>     <chr>        <date>     <chr>          <chr>     
1 DE       <NA>      <NA>      state        2020-07-06 Phase 1        stop      
2 MS       Sunflower 28133     county       2020-07-20 Outdoor and R… stop      
3 PR       <NA>      <NA>      state        2020-06-15 Non-Essential… stop      
4 MO       <NA>      <NA>      state        2020-06-15 Non-Essential… stop      
5 DE       <NA>      <NA>      state        2020-09-05 Phase 2        stop      
6 GA       Fulton    13121     county       2020-04-30 Childcare (K-… stop      
# ℹ 4 more variables: comments <chr>, source <chr>, total_phases <dbl>,
#   geocoded_state <chr>

Data 3 - Alzheimer’s Data

Introduction and data

  • Identify the source of the data.

    The source of the data is Data.Gov and the specific link is: https://catalog.data.gov/dataset/alzheimers-disease-and-healthy-aging-data

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

    The data was collected from 2015 to 2022 through the Behavioral Risk Factor Surveillance System (BRFSS) by the CDC; the researcher(s) curated the data through surveying.

  • Write a brief description of the observations.

    The observations address different factors that could contribute to developing Alzheimer’s disease and it tracks the age group, race, and gender of the responses for these potential factors.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

    Which factor is the most accurate identifier for individuals to have Alzheimer’s disease?

  • A description of the research topic along with a concise statement of your hypotheses on this topic.

    The research topic combs through each of the different factors that the dataset contains and discerns which of these factor(s) have a confidence limit range that can be used to correlate the factor and whether an individual has Alzheimer’s. The hypothesis is: if individual reports having “frequent mental distress,” then they are more likely to have Alzheimer’s.

  • Identify the types of variables in your research question. Categorical? Quantitative?

    The variables that we are looking at is Question which is categorial (this variable describes the different potential factors that could contribute to Alzheimer’s) and the Data_Value, which tells us a statistic for people with Alzheimer’s.

    Note: The variable Data_Value_Type will also need to be considered, which describes what type of statistic the Data_Value variable is; this variable type is categorical.

Glimpse of data

# A tibble: 6 × 39
  RowId YearStart YearEnd LocationAbbr LocationDesc Datasource Class       Topic
  <lgl>     <dbl>   <dbl> <chr>        <chr>        <chr>      <chr>       <chr>
1 NA         2020    2020 HI           Hawaii       BRFSS      Overall He… Arth…
2 NA         2017    2017 ID           Idaho        BRFSS      Mental Hea… Life…
3 NA         2017    2017 ID           Idaho        BRFSS      Overall He… Arth…
4 NA         2018    2018 ID           Idaho        BRFSS      Overall He… Phys…
5 NA         2020    2020 IN           Indiana      BRFSS      Mental Hea… Life…
6 NA         2020    2020 IA           Iowa         BRFSS      Overall He… Prev…
# ℹ 31 more variables: Question <chr>, Response <lgl>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Data_Value_Alt <dbl>, Data_Value_Footnote_Symbol <chr>,
#   Data_Value_Footnote <chr>, Low_Confidence_Limit <chr>,
#   High_Confidence_Limit <chr>, Sample_Size <lgl>,
#   StratificationCategory1 <chr>, Stratification1 <chr>,
#   StratificationCategory2 <chr>, Stratification2 <chr>, …