library(tidyverse)
library(skimr)
Project Brilliant Togepi
Proposal
Data 1
Introduction and data
Identify the source of the data.
- Source: CDC
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
The data on the death counts is taken by the CDC through reports by physicians, medical examiners, or coroners in the cause-of-death section of each certificate.
Data on vaccinations are obtained by the CDC through all vaccination partners including jurisdictional partner clinics, retail pharmacies, long-term care facilities, dialysis centers, Federal Emergency Management Agency and Health Resources and Services Administration partner sites, and federal entity facilities.
Write a brief description of the observations.
Based on the data set, it seems like the focus of the data is on the period when COVID was most affecting the US.
The data iteself is untidy, and is not really organized in any particular pattern, so in its current state, it is difficult to spot any trends in the COVID-19 data
Most of the data included in the CSV files include the state, so combining these data sets for further analysis will not be too difficult
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
Do required mask mandates or vaccines have a larger impact on confirmed COVID cases?
As the percentage of counties (per state) requiring masks in public increases, how does the amount of confirmed COVID cases change?
- A description of the research topic along with a concise statement of your hypotheses on this topic.
- With these datasets together, we plan to combine variables from each to create a much tidier dataset and analyze COVID transmission. As COVID was politicized with misleading data being spread across the internet during the pandemic, it will be interesting to use this CDC data to create our own interpretations rather than simply trusting the opinions of those with large reach online. We believe that as vaccine and mask mandates increased in states, the number of confirmed COVID cases decreased. Additionally, we are interested in seeing what the correlation is between COVID cases and amount of page views for the CDC website.
- Identify the types of variables in your research question. Categorical? Quantitative?
- Our research question has both categorical and quantititative variables. For example, we have the different brands of vaccines, counties in a state, and whether or not masks are required in public as categorical variables. We also have variables such as confirmed covid cases and amount of vaccines distributed as quantitative variables.
Glimpse of data
# will become one dataset
<- read.csv("data/us_mask.csv")
masks <- read.csv("data/us_covid.csv")
covid <- read.csv("data/us_vaccine.csv")
vaccine <- read_csv("data/cdc_monthly_page_views.csv") pageviews
Rows: 228 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Month
dbl (3): Sort, Year, Page Views
num (1): Page Visits
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
::skim(masks) skimr
Name | masks |
Number of rows | 1593869 |
Number of columns | 10 |
_______________________ | |
Column type frequency: | |
character | 7 |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
State_Tribe_Territory | 0 | 1.00 | 2 | 2 | 0 | 56 | 0 |
County_Name | 0 | 1.00 | 4 | 33 | 0 | 1968 | 0 |
date | 0 | 1.00 | 8 | 10 | 0 | 493 | 0 |
Face_Masks_Required_in_Public | 606314 | 0.62 | 2 | 3 | 0 | 2 | 0 |
Source_of_Action | 606314 | 0.62 | 8 | 13 | 0 | 2 | 0 |
URL | 651574 | 0.59 | 43 | 540 | 0 | 548 | 0 |
Citation | 616596 | 0.61 | 14 | 161 | 0 | 629 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
FIPS_State | 0 | 1 | 31.44 | 16.41 | 1 | 19 | 30 | 46 | 78 | ▅▇▆▅▁ |
FIPS_County | 0 | 1 | 102.70 | 106.55 | 1 | 35 | 79 | 133 | 840 | ▇▁▁▁▁ |
order_code | 0 | 1 | 1.54 | 0.50 | 1 | 1 | 2 | 2 | 2 | ▇▁▁▁▇ |
::skim(covid) skimr
Name | covid |
Number of rows | 60060 |
Number of columns | 15 |
_______________________ | |
Column type frequency: | |
character | 5 |
numeric | 10 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
submission_date | 0 | 1 | 10 | 10 | 0 | 1001 | 0 |
state | 0 | 1 | 2 | 3 | 0 | 60 | 0 |
created_at | 0 | 1 | 22 | 22 | 0 | 2224 | 0 |
consent_cases | 0 | 1 | 0 | 9 | 4009 | 4 | 0 |
consent_deaths | 0 | 1 | 0 | 9 | 5005 | 4 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
tot_cases | 0 | 1.00 | 656964.11 | 1173489.80 | 0 | 18303.25 | 222841.5 | 815855.25 | 11309237 | ▇▁▁▁▁ |
conf_cases | 26026 | 0.57 | 652799.39 | 1077693.49 | 0 | 65122.75 | 299246.0 | 842673.25 | 10458792 | ▇▁▁▁▁ |
prob_cases | 26098 | 0.57 | 107357.46 | 157946.54 | 0 | 169.25 | 32175.0 | 150251.25 | 850445 | ▇▁▁▁▁ |
new_case | 0 | 1.00 | 1601.41 | 5074.26 | -10199 | 3.00 | 344.0 | 1435.00 | 319809 | ▇▁▁▁▁ |
pnew_case | 3526 | 0.94 | 267.29 | 1439.17 | -171804 | 0.00 | 1.0 | 175.00 | 171617 | ▁▁▇▁▁ |
tot_death | 0 | 1.00 | 9351.24 | 14591.37 | 0 | 361.00 | 3241.0 | 12353.25 | 95604 | ▇▁▁▁▁ |
conf_death | 26787 | 0.55 | 9015.95 | 10431.92 | 0 | 1377.00 | 5193.0 | 13720.00 | 71408 | ▇▂▁▁▁ |
prob_death | 26787 | 0.55 | 1093.25 | 1549.19 | 0 | 0.00 | 309.0 | 1691.00 | 7889 | ▇▂▁▁▁ |
new_death | 0 | 1.00 | 17.37 | 43.50 | -352 | 0.00 | 3.0 | 16.00 | 1178 | ▁▇▁▁▁ |
pnew_death | 3494 | 0.94 | 1.83 | 24.53 | -2594 | 0.00 | 0.0 | 1.00 | 2919 | ▁▁▇▁▁ |
::skim(vaccine) skimr
Name | vaccine |
Number of rows | 37912 |
Number of columns | 109 |
_______________________ | |
Column type frequency: | |
character | 2 |
numeric | 107 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Date | 0 | 1 | 10 | 10 | 0 | 589 | 0 |
Location | 0 | 1 | 2 | 3 | 0 | 66 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
MMWR_week | 0 | 1.00 | 23.56 | 15.48 | 1 | 10.00 | 21.00 | 37.00 | 53.0 | ▇▇▅▃▅ |
Distributed | 0 | 1.00 | 15203017.79 | 66346868.42 | 0 | 960725.00 | 3784610.00 | 9932098.75 | 967430045.0 | ▇▁▁▁▁ |
Distributed_Janssen | 0 | 1.00 | 684828.11 | 2964960.72 | 0 | 22500.00 | 171400.00 | 455800.00 | 32496900.0 | ▇▁▁▁▁ |
Distributed_Moderna | 0 | 1.00 | 5736583.13 | 25045480.19 | 0 | 228700.00 | 1468850.00 | 3926785.00 | 351003220.0 | ▇▁▁▁▁ |
Distributed_Pfizer | 0 | 1.00 | 8577346.08 | 38413907.57 | 0 | 226785.00 | 1989892.50 | 5450246.25 | 583678325.0 | ▇▁▁▁▁ |
Distributed_Novavax | 35800 | 0.06 | 28174.67 | 113286.64 | 0 | 2800.00 | 7800.00 | 21200.00 | 1196500.0 | ▇▁▁▁▁ |
Distributed_Unk_Manuf | 0 | 1.00 | 2113.54 | 83905.10 | 0 | 0.00 | 0.00 | 0.00 | 8282150.0 | ▇▁▁▁▁ |
Dist_Per_100K | 0 | 1.00 | 131261.63 | 79999.75 | 0 | 76030.00 | 135741.00 | 192751.50 | 405539.0 | ▆▇▇▂▁ |
Distributed_Per_100k_5Plus | 448 | 0.99 | 92229.58 | 111904.57 | 0 | 0.00 | 0.00 | 206543.50 | 425336.0 | ▇▁▅▁▁ |
Distributed_Per_100k_12Plus | 0 | 1.00 | 140163.46 | 107719.30 | 0 | 0.00 | 156914.50 | 227754.25 | 459301.0 | ▇▇▇▂▁ |
Distributed_Per_100k_18Plus | 0 | 1.00 | 166453.12 | 106763.72 | 0 | 88099.00 | 173357.50 | 251018.50 | 496196.0 | ▆▇▇▂▁ |
Distributed_Per_100k_65Plus | 0 | 1.00 | 829696.82 | 653184.74 | 0 | 405963.50 | 804765.00 | 1166627.50 | 6036990.0 | ▇▂▁▁▁ |
Administered | 0 | 1.00 | 11865368.11 | 52516352.87 | 0 | 729973.75 | 2929770.50 | 7797573.50 | 672537312.0 | ▇▁▁▁▁ |
Administered_5Plus | 448 | 0.99 | 7842623.34 | 46782774.14 | 0 | 0.00 | 0.00 | 4636601.00 | 668381561.0 | ▇▁▁▁▁ |
Administered_12Plus | 0 | 1.00 | 10803682.68 | 50827438.41 | 0 | 0.00 | 2151088.00 | 7230964.25 | 644700162.0 | ▇▁▁▁▁ |
Administered_18Plus | 0 | 1.00 | 10879838.42 | 48194401.00 | 0 | 561477.50 | 2658485.50 | 7187786.75 | 604314715.0 | ▇▁▁▁▁ |
Administered_65Plus | 0 | 1.00 | 3141122.34 | 13751149.80 | 0 | 50634.00 | 802419.00 | 2242111.50 | 182650661.0 | ▇▁▁▁▁ |
Administered_Janssen | 0 | 1.00 | 398282.62 | 1785837.10 | 0 | 12261.00 | 90187.50 | 260288.75 | 18985124.0 | ▇▁▁▁▁ |
Administered_Moderna | 0 | 1.00 | 4609474.92 | 20110316.58 | 0 | 316509.75 | 1202395.50 | 3039758.25 | 251560771.0 | ▇▁▁▁▁ |
Administered_Pfizer | 0 | 1.00 | 6845996.77 | 30606111.34 | 0 | 385234.50 | 1613378.00 | 4566258.50 | 401070068.0 | ▇▁▁▁▁ |
Administered_Novavax | 35807 | 0.06 | 1558.44 | 6881.34 | 0 | 79.00 | 343.00 | 1102.00 | 81544.0 | ▇▁▁▁▁ |
Administered_Unk_Manuf | 3 | 1.00 | 11528.18 | 53396.17 | 0 | 60.00 | 688.00 | 3577.00 | 839805.0 | ▇▁▁▁▁ |
Admin_Per_100K | 0 | 1.00 | 104728.23 | 64625.70 | 0 | 55922.50 | 111335.00 | 151618.25 | 298212.0 | ▆▇▇▃▁ |
Admin_Per_100k_5Plus | 448 | 0.99 | 73078.28 | 88849.00 | 0 | 0.00 | 0.00 | 161421.75 | 314272.0 | ▇▁▃▂▁ |
Admin_Per_100k_12Plus | 0 | 1.00 | 110140.86 | 83249.26 | 0 | 0.00 | 129922.00 | 174328.00 | 326935.0 | ▇▅▇▃▁ |
Admin_Per_100k_18Plus | 0 | 1.00 | 122769.77 | 76177.65 | 0 | 64419.25 | 134569.00 | 179839.25 | 329135.0 | ▆▆▇▃▁ |
Admin_Per_100k_65Plus | 0 | 1.00 | 162655.70 | 100831.76 | 0 | 99692.50 | 174850.50 | 243213.50 | 451103.0 | ▆▇▇▂▁ |
Recip_Administered | 0 | 1.00 | 11753657.29 | 52523589.66 | 0 | 556689.75 | 2836631.50 | 7609341.50 | 672537312.0 | ▇▁▁▁▁ |
Administered_Dose1_Recip | 0 | 1.00 | 5809640.28 | 25229047.65 | 0 | 323842.00 | 1478779.50 | 3872836.25 | 269650596.0 | ▇▁▁▁▁ |
Administered_Dose1_Pop_Pct | 0 | 1.00 | 51.24 | 28.79 | 0 | 33.50 | 58.70 | 72.00 | 100.0 | ▅▂▆▇▃ |
Administered_Dose1_Recip_5Plus | 448 | 0.99 | 3589049.33 | 21290413.36 | 0 | 0.00 | 0.00 | 2138729.25 | 267572623.0 | ▇▁▁▁▁ |
Administered_Dose1_Recip_5PlusPop_Pct | 448 | 0.99 | 33.03 | 39.57 | 0 | 0.00 | 0.00 | 74.40 | 99.9 | ▇▁▁▃▃ |
Administered_Dose1_Recip_12Plus | 0 | 1.00 | 5225542.09 | 24176524.62 | 0 | 0.00 | 1068491.00 | 3534497.00 | 256128740.0 | ▇▁▁▁▁ |
Administered_Dose1_Recip_12PlusPop_Pct | 0 | 1.00 | 52.30 | 37.27 | 0 | 0.00 | 68.30 | 82.50 | 99.9 | ▇▁▂▇▇ |
Administered_Dose1_Recip_18Plus | 0 | 1.00 | 5343397.08 | 23033494.46 | 0 | 308029.00 | 1414983.00 | 3606240.25 | 237904856.0 | ▇▁▁▁▁ |
Administered_Dose1_Recip_18PlusPop_Pct | 0 | 1.00 | 59.12 | 32.51 | 0 | 39.70 | 70.70 | 84.30 | 99.9 | ▅▂▃▇▇ |
Administered_Dose1_Recip_65Plus | 0 | 1.00 | 1412111.95 | 6002240.99 | 0 | 19319.00 | 378694.00 | 996032.75 | 58778173.0 | ▇▁▁▁▁ |
Administered_Dose1_Recip_65PlusPop_Pct | 0 | 1.00 | 69.38 | 37.42 | 0 | 58.30 | 89.80 | 95.00 | 109.0 | ▃▁▁▃▇ |
Series_Complete_Yes | 0 | 1.00 | 4811571.56 | 21300590.45 | 0 | 134682.75 | 1123323.00 | 3235252.75 | 230142115.0 | ▇▁▁▁▁ |
Series_Complete_Pop_Pct | 0 | 1.00 | 42.83 | 25.92 | 0 | 23.00 | 50.60 | 61.90 | 90.6 | ▆▂▆▇▂ |
Series_Complete_5Plus | 448 | 0.99 | 3054455.99 | 18126984.91 | 0 | 0.00 | 0.00 | 1819987.25 | 229040889.0 | ▇▁▁▁▁ |
Series_Complete_5PlusPop_Pct | 448 | 0.99 | 28.54 | 34.32 | 0 | 0.00 | 0.00 | 63.80 | 95.0 | ▇▁▁▃▂ |
Series_Complete_12Plus | 0 | 1.00 | 4457378.28 | 20640205.74 | 0 | 0.00 | 901074.00 | 3061199.75 | 219636602.0 | ▇▁▁▁▁ |
Series_Complete_12PlusPop_Pct | 0 | 1.00 | 45.57 | 32.85 | 0 | 0.00 | 59.00 | 71.20 | 100.0 | ▇▁▃▇▂ |
Series_Complete_18Plus | 0 | 1.00 | 4442850.68 | 19528361.93 | 0 | 127746.00 | 1068303.00 | 3054443.00 | 204032234.0 | ▇▁▁▁▁ |
Series_Complete_18PlusPop_Pct | 0 | 1.00 | 50.31 | 30.22 | 0 | 25.10 | 61.60 | 72.90 | 99.9 | ▅▂▃▇▂ |
Series_Complete_65Plus | 0 | 1.00 | 1211994.82 | 5181904.93 | 0 | 16652.75 | 318236.00 | 894077.75 | 51663242.0 | ▇▁▁▁▁ |
Series_Complete_65PlusPop_Pct | 0 | 1.00 | 62.88 | 35.25 | 0 | 37.90 | 81.00 | 88.20 | 99.9 | ▃▁▁▃▇ |
Series_Complete_Janssen | 0 | 1.00 | 374893.97 | 1671841.22 | 0 | 10865.75 | 84187.00 | 243465.75 | 17173853.0 | ▇▁▁▁▁ |
Series_Complete_Moderna | 0 | 1.00 | 1764095.79 | 7698539.35 | 0 | 51193.00 | 443592.00 | 1244783.00 | 79821310.0 | ▇▁▁▁▁ |
Series_Complete_Pfizer | 0 | 1.00 | 2669103.51 | 11933130.57 | 0 | 70964.25 | 608221.00 | 1801014.50 | 132517282.0 | ▇▁▁▁▁ |
Series_Complete_Novavax | 35808 | 0.06 | 471.48 | 2072.67 | 0 | 20.00 | 108.50 | 336.00 | 24759.0 | ▇▁▁▁▁ |
Series_Complete_Unk_Manuf | 4 | 1.00 | 3067.09 | 14762.28 | 0 | 3.00 | 204.00 | 1001.25 | 236584.0 | ▇▁▁▁▁ |
Series_Complete_Janssen_5Plus | 21016 | 0.45 | 525470.91 | 2063970.01 | 0 | 53934.00 | 164280.00 | 323872.25 | 17170667.0 | ▇▁▁▁▁ |
Series_Complete_Moderna_5Plus | 21016 | 0.45 | 2388243.61 | 9317660.24 | 0 | 238992.50 | 878268.00 | 1538076.50 | 79243124.0 | ▇▁▁▁▁ |
Series_Complete_Pfizer_5Plus | 21016 | 0.45 | 3854179.61 | 15125490.36 | 0 | 374123.75 | 1238119.00 | 2510312.25 | 132367780.0 | ▇▁▁▁▁ |
Series_Complete_Unk_Manuf_5Plus | 21020 | 0.45 | 4784.22 | 19653.25 | 0 | 115.00 | 590.00 | 2149.00 | 234718.0 | ▇▁▁▁▁ |
Series_Complete_Janssen_12Plus | 0 | 1.00 | 355781.05 | 1655509.10 | 0 | 0.00 | 63889.50 | 235776.00 | 17168556.0 | ▇▁▁▁▁ |
Series_Complete_Moderna_12Plus | 0 | 1.00 | 1651079.33 | 7596147.09 | 0 | 0.00 | 354799.50 | 1177058.75 | 79176937.0 | ▇▁▁▁▁ |
Series_Complete_Pfizer_12Plus | 0 | 1.00 | 2447594.96 | 11381545.82 | 0 | 0.00 | 485916.50 | 1684953.25 | 123043366.0 | ▇▁▁▁▁ |
Series_Complete_Unk_Manuf_12Plus | 4 | 1.00 | 2897.84 | 14472.82 | 0 | 0.00 | 124.00 | 880.00 | 223757.0 | ▇▁▁▁▁ |
Series_Complete_Janssen_18Plus | 0 | 1.00 | 373721.75 | 1666833.70 | 0 | 10849.00 | 84067.50 | 242791.50 | 17140407.0 | ▇▁▁▁▁ |
Series_Complete_Moderna_18Plus | 0 | 1.00 | 1759153.35 | 7676566.60 | 0 | 51193.00 | 443397.00 | 1243635.00 | 79066584.0 | ▇▁▁▁▁ |
Series_Complete_Pfizer_18Plus | 0 | 1.00 | 2307006.73 | 10176032.38 | 0 | 66427.75 | 549402.50 | 1629128.00 | 107596454.0 | ▇▁▁▁▁ |
Series_Complete_Unk_Manuf_18Plus | 4 | 1.00 | 2936.55 | 14039.12 | 0 | 3.00 | 191.00 | 924.00 | 206017.0 | ▇▁▁▁▁ |
Series_Complete_Janssen_65Plus | 0 | 1.00 | 56190.36 | 244398.56 | 0 | 632.00 | 13760.50 | 37361.75 | 2371698.0 | ▇▁▁▁▁ |
Series_Complete_Moderna_65Plus | 0 | 1.00 | 579595.47 | 2476550.17 | 0 | 6803.00 | 149418.00 | 427474.25 | 26213750.0 | ▇▁▁▁▁ |
Series_Complete_Pfizer_65Plus | 0 | 1.00 | 576257.90 | 2469478.29 | 0 | 9172.75 | 150715.00 | 436997.75 | 27935710.0 | ▇▁▁▁▁ |
Series_Complete_Unk_Manuf_65Plus | 9 | 1.00 | 1526.87 | 24691.99 | 0 | 1.00 | 71.00 | 422.00 | 2349816.0 | ▇▁▁▁▁ |
Additional_Doses | 16348 | 0.57 | 2152481.68 | 9931737.00 | 0 | 18553.50 | 467227.50 | 1478154.50 | 117621762.0 | ▇▁▁▁▁ |
Additional_Doses_Vax_Pct | 325 | 0.99 | 17.42 | 21.10 | 0 | 0.00 | 0.00 | 39.20 | 67.5 | ▇▁▂▂▁ |
Additional_Doses_5Plus | 35544 | 0.06 | 3548888.77 | 13956156.16 | 10232 | 344607.00 | 1025834.50 | 2404023.50 | 117547717.0 | ▇▁▁▁▁ |
Additional_Doses_5Plus_Vax_Pct | 35544 | 0.06 | 49.23 | 8.35 | 24 | 43.90 | 49.15 | 55.70 | 67.6 | ▁▂▇▇▂ |
Additional_Doses_12Plus | 26456 | 0.30 | 3163981.13 | 12497912.61 | 1411 | 293103.25 | 911145.00 | 2153299.25 | 115318712.0 | ▇▁▁▁▁ |
Additional_Doses_12Plus_Vax_Pct | 26456 | 0.30 | 46.04 | 10.87 | 0 | 40.40 | 46.50 | 53.50 | 71.0 | ▁▁▅▇▂ |
Additional_Doses_18Plus | 325 | 0.99 | 1192369.29 | 7318199.28 | 0 | 0.00 | 0.00 | 605249.00 | 110159095.0 | ▇▁▁▁▁ |
Additional_Doses_18Plus_Vax_Pct | 325 | 0.99 | 18.77 | 22.76 | 0 | 0.00 | 0.00 | 42.10 | 72.5 | ▇▁▂▂▁ |
Additional_Doses_50Plus | 325 | 0.99 | 778479.96 | 4695513.82 | 0 | 0.00 | 0.00 | 426214.00 | 68922968.0 | ▇▁▁▁▁ |
Additional_Doses_50Plus_Vax_Pct | 0 | 1.00 | 23.60 | 28.05 | 0 | 0.00 | 0.00 | 54.20 | 81.3 | ▇▁▁▃▂ |
Additional_Doses_65Plus | 325 | 0.99 | 445618.68 | 2643441.66 | 0 | 0.00 | 0.00 | 254436.00 | 38027526.0 | ▇▁▁▁▁ |
Additional_Doses_65Plus_Vax_Pct | 325 | 0.99 | 27.66 | 32.03 | 0 | 0.00 | 0.00 | 63.40 | 88.5 | ▇▁▁▃▂ |
Additional_Doses_Moderna | 325 | 0.99 | 526482.41 | 3259401.25 | 0 | 0.00 | 0.00 | 266539.50 | 48675059.0 | ▇▁▁▁▁ |
Additional_Doses_Pfizer | 325 | 0.99 | 688401.31 | 4227459.88 | 0 | 0.00 | 0.00 | 350871.50 | 67306254.0 | ▇▁▁▁▁ |
Additional_Doses_Janssen | 327 | 0.99 | 17887.45 | 111904.69 | 0 | 0.00 | 0.00 | 8202.00 | 1561830.0 | ▇▁▁▁▁ |
Additional_Doses_Unk_Manuf | 331 | 0.99 | 395.49 | 2685.55 | 0 | 0.00 | 0.00 | 70.00 | 72042.0 | ▇▁▁▁▁ |
Second_Booster | 37818 | 0.00 | 20940412.98 | 12993892.37 | 6065193 | 11604388.25 | 15724217.00 | 26300459.00 | 47866632.0 | ▇▆▂▁▃ |
Second_Booster_50Plus | 31896 | 0.16 | 572656.05 | 2549983.61 | 0 | 46138.00 | 139643.00 | 383135.50 | 36171359.0 | ▇▁▁▁▁ |
Second_Booster_50Plus_Vax_Pct | 31896 | 0.16 | 25.65 | 14.66 | 0 | 14.80 | 22.30 | 34.42 | 67.2 | ▃▇▃▂▁ |
Second_Booster_65Plus | 31896 | 0.16 | 385270.86 | 1674480.08 | 0 | 32933.75 | 99922.00 | 268246.75 | 22753682.0 | ▇▁▁▁▁ |
Second_Booster_65Plus_Vax_Pct | 31896 | 0.16 | 31.36 | 16.35 | 0 | 19.10 | 28.40 | 41.70 | 75.1 | ▃▇▅▃▂ |
Second_Booster_Janssen | 31905 | 0.16 | 535.09 | 2172.94 | 0 | 45.00 | 118.00 | 313.00 | 23472.0 | ▇▁▁▁▁ |
Second_Booster_Moderna | 31896 | 0.16 | 296702.65 | 1345796.53 | 1 | 22581.25 | 67113.00 | 186591.75 | 19829102.0 | ▇▁▁▁▁ |
Second_Booster_Pfizer | 31896 | 0.16 | 361232.53 | 1734306.44 | 4 | 26174.50 | 82005.50 | 220708.50 | 27976130.0 | ▇▁▁▁▁ |
Second_Booster_Unk_Manuf | 31907 | 0.16 | 486.88 | 2158.09 | 0 | 4.00 | 28.00 | 180.00 | 32190.0 | ▇▁▁▁▁ |
Administered_Bivalent | 36248 | 0.04 | 1158600.53 | 5048831.90 | 0 | 80719.25 | 272271.00 | 808310.25 | 54419971.0 | ▇▁▁▁▁ |
Admin_Bivalent_PFR | 36312 | 0.04 | 767149.06 | 3282941.69 | 0 | 60857.50 | 182003.00 | 537557.50 | 34747465.0 | ▇▁▁▁▁ |
Admin_Bivalent_MOD | 36312 | 0.04 | 435866.28 | 1861447.24 | 0 | 29521.25 | 105033.00 | 313338.75 | 19672506.0 | ▇▁▁▁▁ |
Dist_Bivalent_PFR | 36312 | 0.04 | 1901436.75 | 7808853.62 | 300 | 212070.00 | 499870.00 | 1421172.50 | 81485210.0 | ▇▁▁▁▁ |
Dist_Bivalent_MOD | 36312 | 0.04 | 893595.25 | 3700494.38 | 200 | 88900.00 | 227700.00 | 627075.00 | 38435900.0 | ▇▁▁▁▁ |
Bivalent_Booster_5Plus | 36568 | 0.04 | 1365374.26 | 5532552.15 | 0 | 141230.25 | 343484.50 | 1022303.25 | 53910391.0 | ▇▁▁▁▁ |
Bivalent_Booster_5Plus_Pop_Pct | 36568 | 0.04 | 12.65 | 7.58 | 0 | 7.50 | 11.90 | 17.50 | 34.3 | ▅▇▆▃▁ |
Bivalent_Booster_12Plus | 36504 | 0.04 | 1300581.72 | 5316625.55 | 0 | 130808.50 | 331617.00 | 945788.50 | 52668775.0 | ▇▁▁▁▁ |
Bivalent_Booster_12Plus_Pop_Pct | 36504 | 0.04 | 13.27 | 8.08 | 0 | 7.50 | 12.45 | 18.52 | 35.9 | ▅▇▆▃▁ |
Bivalent_Booster_18Plus | 36504 | 0.04 | 1258938.12 | 5141296.47 | 0 | 126968.50 | 322174.00 | 911081.75 | 50821425.0 | ▇▁▁▁▁ |
Bivalent_Booster_18Plus_Pop_Pct | 36504 | 0.04 | 14.07 | 8.43 | 0 | 8.10 | 13.30 | 19.60 | 37.1 | ▅▇▆▃▁ |
Bivalent_Booster_65Plus | 36504 | 0.04 | 583440.70 | 2356210.02 | 0 | 54719.50 | 165515.50 | 438194.75 | 22796124.0 | ▇▁▁▁▁ |
Bivalent_Booster_65Plus_Pop_Pct | 36504 | 0.04 | 30.49 | 16.76 | 0 | 18.88 | 31.60 | 42.92 | 68.1 | ▅▆▇▆▂ |
::skim(pageviews) skimr
Name | pageviews |
Number of rows | 228 |
Number of columns | 5 |
_______________________ | |
Column type frequency: | |
character | 1 |
numeric | 4 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Month | 0 | 1 | 3 | 9 | 0 | 12 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Sort | 0 | 1.0 | 6.5 | 3.46 | 1 | 3.75 | 6.5 | 9.25 | 12 | ▇▅▅▅▇ |
Year | 0 | 1.0 | 2012.5 | 5.51 | 2003 | 2008.00 | 2012.5 | 2017.00 | 2022 | ▇▇▇▇▇ |
Page Views | 1 | 1.0 | 83473966.2 | 95509590.57 | 1415875 | 39897754.50 | 63277568.0 | 87652029.00 | 1075594920 | ▇▁▁▁▁ |
Page Visits | 22 | 0.9 | 35932832.5 | 55929589.20 | 1484766 | 9634804.50 | 18757177.0 | 30696210.00 | 464305872 | ▇▁▁▁▁ |
Data 2
Introduction and data
The data was published by the Mexican government, and it covers more than 4,900 divorce cases.
Since the data comes from the Mexican government, they used the divorce case records from Xalapa, Mexico.
Recorded the date of the divorce, type of the divorce, ages of the two people getting divorced, and other factors. There are 41 columns and more than 4900 rows/independent observations. It is important to note, that although the table data has some columns that are in Spanish, they can be easily translated to English
Research question
Questions:
1.) What is the leading cause of divorce in the city of Xalapa, Mexico?
2.) What factor has the biggest impact on divorce rates in Xalapa, Mexico?
3.) Is there are a correlation between large age gaps and divorce?
We are considering analyzing divorce trends within the data. The data gives us many details about each of the people getting a divorce, which we then can use to connect them with similar cases. Additionally, we believe that certain circumstances, may have a correlation to divorce rates. Moreover, one of the circumstances we are considering is age gaps, as we believe there is a positive correlation between age gaps within married couples and divorce rates among those couples.
For this specific research question we would use a variable representing the age gap which would be quantitative and the divorce rate, which is also quantitative.
Glimpse of data
<- read.csv("data/divorces_2000-2015_translated.csv")
divorces
::skim(divorces) skimr
Name | divorces |
Number of rows | 4923 |
Number of columns | 41 |
_______________________ | |
Column type frequency: | |
character | 34 |
numeric | 7 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Divorce_date | 0 | 1 | 6 | 8 | 0 | 2596 | 0 |
Type_of_divorce | 0 | 1 | 9 | 10 | 0 | 2 | 0 |
Nationality_partner_man | 0 | 1 | 0 | 14 | 1 | 20 | 0 |
DOB_partner_man | 0 | 1 | 0 | 8 | 381 | 3849 | 0 |
Place_of_birth_partner_man | 0 | 1 | 0 | 28 | 126 | 670 | 0 |
Birth_municipality_of_partner_man | 0 | 1 | 0 | 32 | 129 | 425 | 0 |
Birth_federal_partner_man | 0 | 1 | 0 | 24 | 128 | 77 | 0 |
Birth_country_partner_man | 0 | 1 | 0 | 25 | 127 | 22 | 0 |
Residence_municipality_partner_man | 0 | 1 | 0 | 26 | 324 | 176 | 0 |
Residence_federal_partner_man | 0 | 1 | 0 | 25 | 323 | 43 | 0 |
Residence_country_partner_man | 0 | 1 | 0 | 25 | 324 | 6 | 0 |
Occupation_partner_man | 0 | 1 | 0 | 32 | 529 | 222 | 0 |
Place_of_residence_partner_man | 0 | 1 | 0 | 26 | 321 | 230 | 0 |
Nationality_partner_woman | 0 | 1 | 0 | 14 | 3 | 17 | 0 |
DOB_partner_woman | 0 | 1 | 0 | 8 | 452 | 3766 | 0 |
DOB_registration_date_partner_woman | 0 | 1 | 0 | 10 | 2679 | 2004 | 0 |
Place_of_birth_partner_woman | 0 | 1 | 0 | 31 | 140 | 654 | 0 |
Birth_municipality_of_partner_woman | 0 | 1 | 0 | 26 | 139 | 405 | 0 |
Birth_federal_partner_woman | 0 | 1 | 0 | 20 | 140 | 73 | 0 |
Birth_country_partner_woman | 0 | 1 | 0 | 25 | 139 | 21 | 0 |
Place_of_residence_partner_woman | 0 | 1 | 0 | 25 | 307 | 175 | 0 |
Residence_municipality_partner_woman | 0 | 1 | 0 | 22 | 307 | 132 | 0 |
Residence_federal_partner_woman | 0 | 1 | 0 | 20 | 305 | 34 | 0 |
Residence_country_partner_woman | 0 | 1 | 0 | 14 | 305 | 5 | 0 |
Occupation_partner_woman | 0 | 1 | 0 | 32 | 578 | 157 | 0 |
Date_of_marriage | 0 | 1 | 6 | 8 | 0 | 3651 | 0 |
Marriage_certificate_place | 0 | 1 | 4 | 25 | 0 | 216 | 0 |
Marriage_certificate_municipality | 0 | 1 | 4 | 25 | 0 | 194 | 0 |
Marriage_certificate_federal | 0 | 1 | 4 | 20 | 0 | 33 | 0 |
Level_of_education_partner_man | 0 | 1 | 0 | 15 | 304 | 7 | 0 |
Employment_status_partner_man | 0 | 1 | 0 | 48 | 356 | 11 | 0 |
Level_of_education_partner_woman | 0 | 1 | 0 | 15 | 380 | 7 | 0 |
Employment_status_partner_woman | 0 | 1 | 0 | 44 | 417 | 11 | 0 |
Custody | 0 | 1 | 0 | 5 | 2851 | 4 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Age_partner_man | 107 | 0.98 | 39.44 | 10.40 | 19.0 | 32 | 38 | 46 | 91 | ▆▇▃▁▁ |
Monthly_income_partner_man_peso | 1419 | 0.71 | 10920.11 | 70569.76 | 2.4 | 3000 | 5600 | 10000 | 3150242 | ▇▁▁▁▁ |
Age_partner_woman | 151 | 0.97 | 36.96 | 9.93 | 17.0 | 29 | 35 | 43 | 84 | ▅▇▃▁▁ |
Monthly_income_partner_woman_peso | 2119 | 0.57 | 7374.25 | 16337.05 | 3.5 | 3000 | 5000 | 8000 | 708652 | ▇▁▁▁▁ |
Marriage_duration | 235 | 0.95 | 11.72 | 9.30 | 1.0 | 4 | 9 | 17 | 61 | ▇▃▁▁▁ |
Marriage_duration_months | 3368 | 0.32 | 6.29 | 3.87 | 0.0 | 4 | 6 | 9 | 93 | ▇▁▁▁▁ |
Num_Children | 1912 | 0.61 | 1.82 | 0.93 | 1.0 | 1 | 2 | 2 | 10 | ▇▂▁▁▁ |
Data 3
Introduction and data
Identify the source of the data.
- The source of the data is Inside Airbnb.
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
- The data was collected in 2019 by dgomonov. It was originally collected to show the listing activities and the metrics of NYC in 2019. They collected it from Inside Airbnb.
Write a brief description of the observations.
- Based on the data set, it seems that it describes the data of hosts, availability in different neighborhoods in NYC, room type, and pricing.
- The data itself is untidy as there is no pattern for how the data was put.
- Through the data it will be interesting to see things like which hosts are busy and which neighborhoods are preferred.
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
How do a listing’s cancellation policy and minimum nights correlate with the number of reviews that it receives?
What is the largest factor in a listing’s price and service fee?
- A description of the research topic along with a concise statement of your hypotheses on this topic.
The research topic of the first question is the effects of how strict a listing’s policies are on the number of people that review it. There are many factors that might cause somebody to review a listing, and some of those likely come from how strict the policies are.
There may be a negative correlation with how strict a listing’s policies are with how many reviews it receives, possibly due to the number of people willing to rent those listings.
- Identify the types of variables in your research question. Categorical? Quantitative?
The cancellation policy is categorical, but the minimum nights and number of reviews are quantitative.
Glimpse of data
<- read_csv("data/airbnb_open_data.csv") airbnb
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 102599 Columns: 26
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (13): NAME, host_identity_verified, host name, neighbourhood group, neig...
dbl (11): id, host id, lat, long, Construction year, minimum nights, number ...
lgl (2): instant_bookable, license
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
::skim(airbnb) skimr
Name | airbnb |
Number of rows | 102599 |
Number of columns | 26 |
_______________________ | |
Column type frequency: | |
character | 13 |
logical | 2 |
numeric | 11 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
NAME | 249 | 1.00 | 1 | 248 | 0 | 61281 | 0 |
host_identity_verified | 289 | 1.00 | 8 | 11 | 0 | 2 | 0 |
host name | 406 | 1.00 | 1 | 35 | 0 | 13190 | 0 |
neighbourhood group | 29 | 1.00 | 5 | 13 | 0 | 7 | 0 |
neighbourhood | 16 | 1.00 | 4 | 26 | 0 | 224 | 0 |
country | 532 | 0.99 | 13 | 13 | 0 | 1 | 0 |
country code | 131 | 1.00 | 2 | 2 | 0 | 1 | 0 |
cancellation_policy | 76 | 1.00 | 6 | 8 | 0 | 3 | 0 |
room type | 0 | 1.00 | 10 | 15 | 0 | 4 | 0 |
price | 247 | 1.00 | 3 | 6 | 0 | 1151 | 0 |
service fee | 273 | 1.00 | 3 | 4 | 0 | 231 | 0 |
last review | 15893 | 0.85 | 8 | 10 | 0 | 2477 | 0 |
house_rules | 52131 | 0.49 | 6 | 1001 | 0 | 1964 | 0 |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
instant_bookable | 105 | 1 | 0.5 | FAL: 51474, TRU: 51020 |
license | 102599 | 0 | NaN | : |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
id | 0 | 1.00 | 2.914623e+07 | 1.625751e+07 | 1001254.00 | 1.508581e+07 | 2.913660e+07 | 4.32012e+07 | 5.736742e+07 | ▇▇▇▇▇ |
host id | 0 | 1.00 | 4.925411e+10 | 2.853900e+10 | 123600518.00 | 2.458333e+10 | 4.911774e+10 | 7.39965e+10 | 9.876313e+10 | ▇▇▇▇▇ |
lat | 8 | 1.00 | 4.073000e+01 | 6.000000e-02 | 40.50 | 4.069000e+01 | 4.072000e+01 | 4.07600e+01 | 4.092000e+01 | ▁▂▇▅▁ |
long | 8 | 1.00 | -7.395000e+01 | 5.000000e-02 | -74.25 | -7.398000e+01 | -7.395000e+01 | -7.39300e+01 | -7.371000e+01 | ▁▁▇▂▁ |
Construction year | 214 | 1.00 | 2.012490e+03 | 5.770000e+00 | 2003.00 | 2.007000e+03 | 2.012000e+03 | 2.01700e+03 | 2.022000e+03 | ▇▇▇▇▇ |
minimum nights | 409 | 1.00 | 8.140000e+00 | 3.055000e+01 | -1223.00 | 2.000000e+00 | 3.000000e+00 | 5.00000e+00 | 5.645000e+03 | ▇▁▁▁▁ |
number of reviews | 183 | 1.00 | 2.748000e+01 | 4.951000e+01 | 0.00 | 1.000000e+00 | 7.000000e+00 | 3.00000e+01 | 1.024000e+03 | ▇▁▁▁▁ |
reviews per month | 15879 | 0.85 | 1.370000e+00 | 1.750000e+00 | 0.01 | 2.200000e-01 | 7.400000e-01 | 2.00000e+00 | 9.000000e+01 | ▇▁▁▁▁ |
review rate number | 326 | 1.00 | 3.280000e+00 | 1.280000e+00 | 1.00 | 2.000000e+00 | 3.000000e+00 | 4.00000e+00 | 5.000000e+00 | ▃▇▇▇▇ |
calculated host listings count | 319 | 1.00 | 7.940000e+00 | 3.222000e+01 | 1.00 | 1.000000e+00 | 1.000000e+00 | 2.00000e+00 | 3.320000e+02 | ▇▁▁▁▁ |
availability 365 | 448 | 1.00 | 1.411300e+02 | 1.354400e+02 | -10.00 | 3.000000e+00 | 9.600000e+01 | 2.69000e+02 | 3.677000e+03 | ▇▁▁▁▁ |