Covid Policies Data Across the United States
Proposal
Data 1 - Pediatric Data
Introduction and data
- Identify the source of the data.
This data was found in healthdata.gov. The specific link is https://healthdata.gov/Hospital/Pediatric-COVID-19-Hospitalizations-by-State/n5sm-z9rn.
- State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
This data was collected from January of 2020 until November of 2022. The data was collected from three main sources: HHS TeleTracking, reporting provided directly to HHS Protect by state/territorial health departments on behalf of their healthcare facilities, and National Healthcare Safety Network.
- Write a brief description of the observations.
The observations (focusing on May 2022 - November 2022) shows that the amount of pediatric beds in use across all states continuously toggles up and down, but has a slight general trend upward in this time period. The data set also includes observations about the critical staffing shortages and whether they are anticipated or occurring currently.
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
How does the percentage of critical staffing shortages change over time?
- A description of the research topic along with a concise statement of your hypotheses on this topic.
The research topic for this question is the state of the healthcare industry and the effect of the Covid-19 pandemic on said state. We expect that the percentage of staffing shortages will have increased over time because the Covid-19 pandemic caused a decrease in number of healthcare workers and an increase in the number of hospitalizations.
- Identify the types of variables in your research question. Categorical? Quantitative?
This question looks at two different variables date which is a qualitative variable, and then the percentage of critical_staffing_shortages which is quantitative. We will likely look at the variables titled critical_staffing_shortage_today_yes, critical_staffing_shortage_today_no, and critical_staffing_shortage_today_not_reported to determine the percentage of staffing shortages at that time.
Glimpse of data
# A tibble: 6 × 135
state date critical_staffing_shortage_today_yes critical_staffing_shor…¹
<chr> <date> <dbl> <dbl>
1 MA 2021-02-23 9 69
2 SD 2021-02-18 2 60
3 RI 2021-02-13 4 10
4 DC 2021-02-05 0 12
5 MA 2021-02-02 8 70
6 KS 2021-01-26 10 128
# ℹ abbreviated name: ¹critical_staffing_shortage_today_no
# ℹ 131 more variables: critical_staffing_shortage_today_not_reported <dbl>,
# critical_staffing_shortage_anticipated_within_week_yes <dbl>,
# critical_staffing_shortage_anticipated_within_week_no <dbl>,
# critical_staffing_shortage_anticipated_within_week_not_reported <dbl>,
# hospital_onset_covid <dbl>, hospital_onset_covid_coverage <dbl>,
# inpatient_beds <dbl>, inpatient_beds_coverage <dbl>, …
Data 2 - COVID-19 Data
Introduction and data
- Identify the source of the data.
This data was found in healthdata.gov. The link to the data can be found here:https://healthdata.gov/dataset/COVID-19-State-and-County-Policy-Orders/gyqz-9u7n
- State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
The data was collected by the U.S Department of Health and Human Services and transferred over to the HHS Office of the Chief Data Officer. The data comes from the BU COVID-19 State Policy Database and “Stay at Home Policies” from wikidata. The data was also curated manually by Virtual Student Federal Service Interns.
- Write a brief description of the observations.
This dataset contains 11 columns and 4,218 rows. Each of the observations (rows) represents a different COVID-19 policy. Every observation is repeated twice since the dataset contains the start and end of every state and county COVID policy from March 2020 to December 2022
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
Is there a correlation between a region’s political stance and the number of covid policies it has implemented?
Do different regions lean towards different policies?
Do certain regions tend towards state or county policies?
- A description of the research topic along with a concise statement of your hypotheses on this topic.
The research topic focuses on monthly averages of hospital staffing shortages which can be grouped by major US geographic regions (ex: West Coast vs. East Coast) which would then help to answer the research question. My hypothesis is that certain regions, like the Southern US, are more likely to have hospital staffing shortages than other regions of the US.
- Identify the types of variables in your research question. Categorical? Quantitative?
The columns of interest for our analysis are state_id which denotes which state a policy occurred in, county which identifies the country that a policy occurred in, policy_level which denotes whether a policy was implemented at a state or county level, date which is the date a policy was implemented on, policy_type which describes details of the COVID policy, start_stop which indicates if the row is the start or end of a policy, comments which are additional comments about the policy, and total_phases which shows how many times the policy was reimplemented or changed.
Glimpse of data
# A tibble: 6 × 11
state_id county fips_code policy_level date policy_type start_stop
<chr> <chr> <chr> <chr> <date> <chr> <chr>
1 DE <NA> <NA> state 2020-07-06 Phase 1 stop
2 MS Sunflower 28133 county 2020-07-20 Outdoor and R… stop
3 PR <NA> <NA> state 2020-06-15 Non-Essential… stop
4 MO <NA> <NA> state 2020-06-15 Non-Essential… stop
5 DE <NA> <NA> state 2020-09-05 Phase 2 stop
6 GA Fulton 13121 county 2020-04-30 Childcare (K-… stop
# ℹ 4 more variables: comments <chr>, source <chr>, total_phases <dbl>,
# geocoded_state <chr>
Data 3 - Alzheimer’s Data
Introduction and data
Identify the source of the data.
The source of the data is Data.Gov and the specific link is: https://catalog.data.gov/dataset/alzheimers-disease-and-healthy-aging-data
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
The data was collected from 2015 to 2022 through the Behavioral Risk Factor Surveillance System (BRFSS) by the CDC; the researcher(s) curated the data through surveying.
Write a brief description of the observations.
The observations address different factors that could contribute to developing Alzheimer’s disease and it tracks the age group, race, and gender of the responses for these potential factors.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
Which factor is the most accurate identifier for individuals to have Alzheimer’s disease?
A description of the research topic along with a concise statement of your hypotheses on this topic.
The research topic combs through each of the different factors that the dataset contains and discerns which of these factor(s) have a confidence limit range that can be used to correlate the factor and whether an individual has Alzheimer’s. The hypothesis is: if individual reports having “frequent mental distress,” then they are more likely to have Alzheimer’s.
Identify the types of variables in your research question. Categorical? Quantitative?
The variables that we are looking at is
Questionwhich is categorial (this variable describes the different potential factors that could contribute to Alzheimer’s) and theData_Value, which tells us a statistic for people with Alzheimer’s.Note: The variable
Data_Value_Typewill also need to be considered, which describes what type of statistic theData_Valuevariable is; this variable type is categorical.
Glimpse of data
# A tibble: 6 × 39
RowId YearStart YearEnd LocationAbbr LocationDesc Datasource Class Topic
<lgl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1 NA 2020 2020 HI Hawaii BRFSS Overall He… Arth…
2 NA 2017 2017 ID Idaho BRFSS Mental Hea… Life…
3 NA 2017 2017 ID Idaho BRFSS Overall He… Arth…
4 NA 2018 2018 ID Idaho BRFSS Overall He… Phys…
5 NA 2020 2020 IN Indiana BRFSS Mental Hea… Life…
6 NA 2020 2020 IA Iowa BRFSS Overall He… Prev…
# ℹ 31 more variables: Question <chr>, Response <lgl>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Data_Value_Alt <dbl>, Data_Value_Footnote_Symbol <chr>,
# Data_Value_Footnote <chr>, Low_Confidence_Limit <chr>,
# High_Confidence_Limit <chr>, Sample_Size <lgl>,
# StratificationCategory1 <chr>, Stratification1 <chr>,
# StratificationCategory2 <chr>, Stratification2 <chr>, …