library(tidyverse)
library(skimr)
library(rvest)
library(robotstxt)
library(openintro)
library(lubridate)
Top Video Games
Proposal
Data 1
Introduction and data
The source of the data is the FiveThirtyEight api from https://data.fivethirtyeight.com/
FiveThirtyEight collects data by compiling polling data from other organizations that pass a minimum bar of being representative, professional, and without bias. Our specific data was collected from a poll conducted by Ipsos for FiveThirtyEight from September 15 to 25.
Observations are people who have been eligible to vote for at least four elections and include voting files as well as their voting frequency.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.
Why do many Americans not vote?
What is the relationship between a respondent’s voting status and one’s residential region?
Is there a correlation between respondent’s age and a respondent’s voting status?
A description of the research topic along with a concise statement of your hypotheses on this topic.
This data is sourced from “fivethirtyeight”. Using respondent’s information:name, age, zip code, and more, one is able to compare the 5,239 respondent’s information to their voting status. The relationship between a respondent’s name and frequency of vote will help visually predict the trends of why one votes and one does not.
We estimate that there will be less activity in voting status at younger ages, specifically under 25 years older. Ages older, specifically 65 years old and older, will be more active and have a higher frequency and history of votes.
Identify the types of variables in your research question. Categorical? Quantitative?
Categorical: name, last name, address
Quantitative: age, zip code
Glimpse of data
# add code here
<- read.csv("data/nonvoters_data.csv")
vote_data ::skim(vote_data) skimr
Name | vote_data |
Number of rows | 5836 |
Number of columns | 119 |
_______________________ | |
Column type frequency: | |
character | 5 |
numeric | 114 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
educ | 0 | 1 | 7 | 19 | 0 | 3 | 0 |
race | 0 | 1 | 5 | 11 | 0 | 4 | 0 |
gender | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
income_cat | 0 | 1 | 7 | 14 | 0 | 4 | 0 |
voter_category | 0 | 1 | 6 | 12 | 0 | 3 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
RespId | 0 | 1.00 | 474654.00 | 3628.48 | 470001.00 | 472069.75 | 474152.00 | 476217.50 | 488325.00 | ▇▇▁▁▁ |
weight | 0 | 1.00 | 0.99 | 0.35 | 0.23 | 0.79 | 0.97 | 1.17 | 3.04 | ▃▇▁▁▁ |
Q1 | 0 | 1.00 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
Q2_1 | 0 | 1.00 | 1.25 | 0.66 | -1.00 | 1.00 | 1.00 | 1.00 | 4.00 | ▁▇▁▁▁ |
Q2_2 | 0 | 1.00 | 1.71 | 0.87 | -1.00 | 1.00 | 2.00 | 2.00 | 4.00 | ▁▇▆▂▁ |
Q2_3 | 0 | 1.00 | 1.64 | 0.77 | -1.00 | 1.00 | 2.00 | 2.00 | 4.00 | ▁▇▇▁▁ |
Q2_4 | 0 | 1.00 | 2.18 | 1.09 | -1.00 | 1.00 | 2.00 | 3.00 | 4.00 | ▁▇▆▆▃ |
Q2_5 | 0 | 1.00 | 1.28 | 0.63 | -1.00 | 1.00 | 1.00 | 1.00 | 4.00 | ▁▇▂▁▁ |
Q2_6 | 0 | 1.00 | 1.81 | 1.01 | -1.00 | 1.00 | 1.00 | 2.00 | 4.00 | ▁▇▃▂▂ |
Q2_7 | 0 | 1.00 | 1.49 | 0.81 | -1.00 | 1.00 | 1.00 | 2.00 | 4.00 | ▁▇▃▁▁ |
Q2_8 | 0 | 1.00 | 1.46 | 0.67 | -1.00 | 1.00 | 1.00 | 2.00 | 4.00 | ▁▇▅▁▁ |
Q2_9 | 0 | 1.00 | 2.09 | 1.25 | -1.00 | 1.00 | 2.00 | 3.00 | 4.00 | ▁▇▃▂▃ |
Q2_10 | 0 | 1.00 | 2.02 | 0.95 | -1.00 | 1.00 | 2.00 | 3.00 | 4.00 | ▁▆▇▃▂ |
Q3_1 | 0 | 1.00 | 1.90 | 1.07 | -1.00 | 1.00 | 2.00 | 3.00 | 4.00 | ▁▇▅▂▂ |
Q3_2 | 0 | 1.00 | 2.38 | 1.19 | -1.00 | 1.00 | 2.00 | 3.00 | 4.00 | ▁▇▆▆▇ |
Q3_3 | 0 | 1.00 | 2.63 | 1.12 | -1.00 | 2.00 | 3.00 | 4.00 | 4.00 | ▁▅▇▇▇ |
Q3_4 | 0 | 1.00 | 1.86 | 0.99 | -1.00 | 1.00 | 2.00 | 2.00 | 4.00 | ▁▇▆▃▂ |
Q3_5 | 0 | 1.00 | 2.02 | 0.87 | -1.00 | 1.00 | 2.00 | 3.00 | 4.00 | ▁▅▇▃▁ |
Q3_6 | 0 | 1.00 | 2.07 | 0.97 | -1.00 | 1.00 | 2.00 | 3.00 | 4.00 | ▁▆▇▃▂ |
Q4_1 | 0 | 1.00 | 1.80 | 0.92 | -1.00 | 1.00 | 2.00 | 2.00 | 4.00 | ▁▇▆▂▁ |
Q4_2 | 0 | 1.00 | 1.83 | 0.89 | -1.00 | 1.00 | 2.00 | 2.00 | 4.00 | ▁▇▇▃▁ |
Q4_3 | 0 | 1.00 | 2.00 | 0.93 | -1.00 | 1.00 | 2.00 | 3.00 | 4.00 | ▁▆▇▃▂ |
Q4_4 | 0 | 1.00 | 2.43 | 1.03 | -1.00 | 2.00 | 2.00 | 3.00 | 4.00 | ▁▅▇▆▅ |
Q4_5 | 0 | 1.00 | 2.34 | 1.04 | -1.00 | 2.00 | 2.00 | 3.00 | 4.00 | ▁▅▇▅▃ |
Q4_6 | 0 | 1.00 | 2.03 | 0.97 | -1.00 | 1.00 | 2.00 | 3.00 | 4.00 | ▁▇▇▅▂ |
Q5 | 0 | 1.00 | 1.16 | 0.41 | -1.00 | 1.00 | 1.00 | 1.00 | 2.00 | ▁▁▁▇▂ |
Q6 | 0 | 1.00 | 2.85 | 0.87 | -1.00 | 2.00 | 3.00 | 3.00 | 4.00 | ▁▁▅▇▃ |
Q7 | 0 | 1.00 | 1.17 | 0.45 | -1.00 | 1.00 | 1.00 | 1.00 | 2.00 | ▁▁▁▇▂ |
Q8_1 | 0 | 1.00 | 2.75 | 1.17 | -1.00 | 2.00 | 3.00 | 4.00 | 4.00 | ▁▃▆▅▇ |
Q8_2 | 0 | 1.00 | 2.75 | 0.88 | -1.00 | 2.00 | 3.00 | 3.00 | 4.00 | ▁▁▇▇▅ |
Q8_3 | 0 | 1.00 | 2.12 | 0.84 | -1.00 | 2.00 | 2.00 | 3.00 | 4.00 | ▁▃▇▃▁ |
Q8_4 | 0 | 1.00 | 2.00 | 0.90 | -1.00 | 1.00 | 2.00 | 2.00 | 4.00 | ▁▅▇▃▁ |
Q8_5 | 0 | 1.00 | 2.46 | 0.89 | -1.00 | 2.00 | 2.00 | 3.00 | 4.00 | ▁▂▇▆▂ |
Q8_6 | 0 | 1.00 | 2.23 | 0.89 | -1.00 | 2.00 | 2.00 | 3.00 | 4.00 | ▁▃▇▃▂ |
Q8_7 | 0 | 1.00 | 2.79 | 1.00 | -1.00 | 2.00 | 3.00 | 4.00 | 4.00 | ▁▂▇▆▇ |
Q8_8 | 0 | 1.00 | 1.99 | 0.91 | -1.00 | 1.00 | 2.00 | 2.00 | 4.00 | ▁▆▇▃▂ |
Q8_9 | 0 | 1.00 | 1.81 | 0.82 | -1.00 | 1.00 | 2.00 | 2.00 | 4.00 | ▁▆▇▂▁ |
Q9_1 | 0 | 1.00 | 1.52 | 0.77 | -1.00 | 1.00 | 1.00 | 2.00 | 4.00 | ▁▇▅▁▁ |
Q9_2 | 0 | 1.00 | 2.54 | 1.04 | -1.00 | 2.00 | 2.00 | 3.00 | 4.00 | ▁▃▇▆▅ |
Q9_3 | 0 | 1.00 | 3.19 | 1.03 | -1.00 | 3.00 | 4.00 | 4.00 | 4.00 | ▁▁▂▅▇ |
Q9_4 | 0 | 1.00 | 3.44 | 0.94 | -1.00 | 3.00 | 4.00 | 4.00 | 4.00 | ▁▁▁▂▇ |
Q10_1 | 0 | 1.00 | 1.91 | 0.32 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
Q10_2 | 0 | 1.00 | 1.78 | 0.44 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▂▇ |
Q10_3 | 0 | 1.00 | 1.82 | 0.42 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▂▇ |
Q10_4 | 0 | 1.00 | 1.98 | 0.18 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
Q11_1 | 0 | 1.00 | 1.86 | 0.38 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
Q11_2 | 0 | 1.00 | 1.97 | 0.24 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
Q11_3 | 0 | 1.00 | 1.60 | 0.51 | -1.00 | 1.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▅▇ |
Q11_4 | 0 | 1.00 | 1.86 | 0.37 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
Q11_5 | 0 | 1.00 | 1.77 | 0.44 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▂▇ |
Q11_6 | 0 | 1.00 | 1.97 | 0.23 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
Q14 | 0 | 1.00 | 2.65 | 1.74 | -1.00 | 1.00 | 2.00 | 5.00 | 5.00 | ▁▇▅▃▇ |
Q15 | 0 | 1.00 | 2.21 | 1.52 | -1.00 | 1.00 | 2.00 | 3.00 | 5.00 | ▁▇▅▂▅ |
Q16 | 0 | 1.00 | 1.62 | 0.84 | -1.00 | 1.00 | 1.00 | 2.00 | 4.00 | ▁▇▅▂▁ |
Q17_1 | 0 | 1.00 | 1.64 | 0.82 | -1.00 | 1.00 | 1.00 | 2.00 | 4.00 | ▁▇▆▁▁ |
Q17_2 | 0 | 1.00 | 1.68 | 0.88 | -1.00 | 1.00 | 2.00 | 2.00 | 4.00 | ▁▇▆▂▁ |
Q17_3 | 0 | 1.00 | 2.32 | 1.13 | -1.00 | 1.00 | 2.00 | 3.00 | 4.00 | ▁▇▇▅▅ |
Q17_4 | 0 | 1.00 | 2.71 | 1.07 | -1.00 | 2.00 | 3.00 | 4.00 | 4.00 | ▁▃▇▇▇ |
Q18_1 | 0 | 1.00 | 1.95 | 0.32 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
Q18_2 | 0 | 1.00 | 1.92 | 0.36 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
Q18_3 | 0 | 1.00 | 1.92 | 0.35 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
Q18_4 | 0 | 1.00 | 1.93 | 0.35 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
Q18_5 | 0 | 1.00 | 1.96 | 0.31 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
Q18_6 | 0 | 1.00 | 1.94 | 0.33 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
Q18_7 | 0 | 1.00 | 1.88 | 0.41 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
Q18_8 | 0 | 1.00 | 1.78 | 0.47 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▂▇ |
Q18_9 | 0 | 1.00 | 1.93 | 0.34 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
Q18_10 | 0 | 1.00 | 1.92 | 0.36 | -1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
Q19_1 | 0 | 1.00 | -0.42 | 0.91 | -1.00 | -1.00 | -1.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
Q19_2 | 0 | 1.00 | 0.03 | 1.00 | -1.00 | -1.00 | 1.00 | 1.00 | 1.00 | ▇▁▁▁▇ |
Q19_3 | 0 | 1.00 | 0.05 | 1.00 | -1.00 | -1.00 | 1.00 | 1.00 | 1.00 | ▇▁▁▁▇ |
Q19_4 | 0 | 1.00 | -0.23 | 0.97 | -1.00 | -1.00 | -1.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
Q19_5 | 0 | 1.00 | -0.20 | 0.98 | -1.00 | -1.00 | -1.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
Q19_6 | 0 | 1.00 | 0.07 | 1.00 | -1.00 | -1.00 | 1.00 | 1.00 | 1.00 | ▇▁▁▁▇ |
Q19_7 | 0 | 1.00 | -0.25 | 0.97 | -1.00 | -1.00 | -1.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
Q19_8 | 0 | 1.00 | -0.43 | 0.90 | -1.00 | -1.00 | -1.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
Q19_9 | 0 | 1.00 | -0.51 | 0.86 | -1.00 | -1.00 | -1.00 | -1.00 | 1.00 | ▇▁▁▁▂ |
Q19_10 | 0 | 1.00 | -0.89 | 0.45 | -1.00 | -1.00 | -1.00 | -1.00 | 1.00 | ▇▁▁▁▁ |
Q20 | 0 | 1.00 | 1.08 | 0.30 | -1.00 | 1.00 | 1.00 | 1.00 | 2.00 | ▁▁▁▇▁ |
Q21 | 0 | 1.00 | 1.21 | 0.57 | -1.00 | 1.00 | 1.00 | 1.00 | 3.00 | ▁▁▇▁▁ |
Q22 | 5350 | 0.08 | 4.21 | 2.11 | -1.00 | 2.00 | 4.00 | 6.00 | 7.00 | ▁▆▁▆▇ |
Q23 | 0 | 1.00 | 1.80 | 0.80 | -1.00 | 1.00 | 2.00 | 2.00 | 3.00 | ▁▁▆▇▃ |
Q24 | 0 | 1.00 | 1.97 | 0.98 | -1.00 | 1.00 | 2.00 | 3.00 | 4.00 | ▁▇▅▇▁ |
Q25 | 0 | 1.00 | 1.84 | 0.93 | -1.00 | 1.00 | 2.00 | 2.00 | 4.00 | ▁▇▇▃▁ |
Q26 | 0 | 1.00 | 1.45 | 0.96 | -1.00 | 1.00 | 1.00 | 1.00 | 4.00 | ▁▇▁▁▁ |
Q27_1 | 0 | 1.00 | 1.25 | 0.51 | -1.00 | 1.00 | 1.00 | 2.00 | 2.00 | ▁▁▁▇▃ |
Q27_2 | 0 | 1.00 | 1.15 | 0.44 | -1.00 | 1.00 | 1.00 | 1.00 | 2.00 | ▁▁▁▇▂ |
Q27_3 | 0 | 1.00 | 1.31 | 0.55 | -1.00 | 1.00 | 1.00 | 2.00 | 2.00 | ▁▁▁▇▅ |
Q27_4 | 0 | 1.00 | 1.21 | 0.47 | -1.00 | 1.00 | 1.00 | 1.00 | 2.00 | ▁▁▁▇▂ |
Q27_5 | 0 | 1.00 | 1.36 | 0.56 | -1.00 | 1.00 | 1.00 | 2.00 | 2.00 | ▁▁▁▇▅ |
Q27_6 | 0 | 1.00 | 1.27 | 0.50 | -1.00 | 1.00 | 1.00 | 2.00 | 2.00 | ▁▁▁▇▃ |
Q28_1 | 534 | 0.91 | 0.53 | 0.85 | -1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▂▁▁▁▇ |
Q28_2 | 534 | 0.91 | -0.27 | 0.96 | -1.00 | -1.00 | -1.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
Q28_3 | 534 | 0.91 | -0.27 | 0.96 | -1.00 | -1.00 | -1.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
Q28_4 | 534 | 0.91 | -0.30 | 0.95 | -1.00 | -1.00 | -1.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
Q28_5 | 534 | 0.91 | -0.28 | 0.96 | -1.00 | -1.00 | -1.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
Q28_6 | 534 | 0.91 | -0.43 | 0.90 | -1.00 | -1.00 | -1.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
Q28_7 | 534 | 0.91 | -0.26 | 0.97 | -1.00 | -1.00 | -1.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
Q28_8 | 534 | 0.91 | -0.95 | 0.30 | -1.00 | -1.00 | -1.00 | -1.00 | 1.00 | ▇▁▁▁▁ |
Q29_1 | 4494 | 0.23 | -0.46 | 0.89 | -1.00 | -1.00 | -1.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
Q29_2 | 4494 | 0.23 | -0.76 | 0.65 | -1.00 | -1.00 | -1.00 | -1.00 | 1.00 | ▇▁▁▁▁ |
Q29_3 | 4494 | 0.23 | -0.40 | 0.92 | -1.00 | -1.00 | -1.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
Q29_4 | 4494 | 0.23 | -0.60 | 0.80 | -1.00 | -1.00 | -1.00 | -1.00 | 1.00 | ▇▁▁▁▂ |
Q29_5 | 4494 | 0.23 | -0.66 | 0.75 | -1.00 | -1.00 | -1.00 | -1.00 | 1.00 | ▇▁▁▁▂ |
Q29_6 | 4494 | 0.23 | -0.93 | 0.38 | -1.00 | -1.00 | -1.00 | -1.00 | 1.00 | ▇▁▁▁▁ |
Q29_7 | 4494 | 0.23 | -0.76 | 0.65 | -1.00 | -1.00 | -1.00 | -1.00 | 1.00 | ▇▁▁▁▁ |
Q29_8 | 4494 | 0.23 | -0.70 | 0.72 | -1.00 | -1.00 | -1.00 | -1.00 | 1.00 | ▇▁▁▁▂ |
Q29_9 | 4494 | 0.23 | -0.81 | 0.59 | -1.00 | -1.00 | -1.00 | -1.00 | 1.00 | ▇▁▁▁▁ |
Q29_10 | 4494 | 0.23 | -0.70 | 0.71 | -1.00 | -1.00 | -1.00 | -1.00 | 1.00 | ▇▁▁▁▂ |
Q30 | 0 | 1.00 | 2.33 | 1.26 | -1.00 | 1.00 | 2.00 | 3.00 | 5.00 | ▁▆▇▆▃ |
Q31 | 4244 | 0.27 | 1.36 | 0.52 | -1.00 | 1.00 | 1.00 | 2.00 | 2.00 | ▁▁▁▇▅ |
Q32 | 3834 | 0.34 | 1.37 | 0.50 | -1.00 | 1.00 | 1.00 | 2.00 | 2.00 | ▁▁▁▇▅ |
Q33 | 3594 | 0.38 | 1.22 | 0.96 | -1.00 | 1.00 | 1.00 | 2.00 | 2.00 | ▂▁▁▇▇ |
ppage | 0 | 1.00 | 51.69 | 17.07 | 22.00 | 36.00 | 54.00 | 65.00 | 94.00 | ▆▅▇▆▁ |
Data 2
Introduction and data
Identify the source of the data.
- The CDC “Center for Disease and Control”
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
- It was collected from December 14th, 2020 till March 7th, 2023, and the data was collected from the CDC workers who administered the vaccine doses to civilians.
Write a brief description of the observations.
- From the data observations, there was a rise in the number of daily vaccine doses from the months of December through April when quarantine had been at its highest. But then it significantly dropped from May to July because it was almost half the doses people were getting. But then it started to stabilize after July but never reached its peak again.
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
How has the number of administered vaccinations (dose 1, dose 2, booster) changed over time?
How has the daily number of administered vaccinations (dose 1, dose 2, booster) changed over time by state?
- A description of the research topic along with a concise statement of your hypotheses on this topic.
Research with this dataset would delve into trends regarding the number of administered COVID-19 vaccinations across the United States. Another potential avenue of research is to join this dataset with another relevant one provided by the CDC in order to investigate different variables related to vaccination trends.
We hypothesize that the number of administered dose 1 COVID-19 vaccinations will display a decreasing trend over time (especially as the United States lifted restrictions) because more pro-vaccine individuals would have already gotten their first dose earlier.
- Identify the types of variables in your research question. Categorical? Quantitative?
Categorical: location, date, date type
Quantitative: The number of administered vaccinations (cumulative and daily, by location)
Glimpse of data
# add code here
<- read.csv("data/cdc-covid-19-vaccinations.csv")
cdc_data ::skim(cdc_data) skimr
Name | cdc_data |
Number of rows | 84240 |
Number of columns | 29 |
_______________________ | |
Column type frequency: | |
character | 3 |
numeric | 26 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Date | 0 | 1 | 10 | 10 | 0 | 816 | 0 |
date_type | 0 | 1 | 5 | 6 | 0 | 2 | 0 |
Location | 0 | 1 | 2 | 2 | 0 | 60 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
MMWR_week | 0 | 1.00 | 24.68 | 15.80 | 1 | 10.0 | 23.0 | 38.00 | 53.0 | ▇▆▆▅▆ |
Administered_Daily | 0 | 1.00 | 31836.78 | 170917.12 | -1593072 | 893.0 | 5071.0 | 17215.25 | 5610894.0 | ▁▇▁▁▁ |
Administered_Cumulative | 0 | 1.00 | 14021962.56 | 60092860.31 | 0 | 930069.8 | 3465873.5 | 9412128.75 | 672537312.0 | ▇▁▁▁▁ |
Administered_7_Day_Rolling_Average | 2280 | 0.97 | 30765.17 | 156108.89 | -138218 | 1530.0 | 5953.0 | 17195.25 | 3506170.0 | ▇▁▁▁▁ |
Admin_Dose_1_Daily | 0 | 1.00 | 12774.09 | 134918.19 | -2468411 | 175.0 | 1084.5 | 5339.00 | 30150525.0 | ▇▁▁▁▁ |
Admin_Dose_1_Cumulative | 0 | 1.00 | 6620169.69 | 27594866.18 | 0 | 497187.5 | 1807394.5 | 4273760.00 | 269650596.0 | ▇▁▁▁▁ |
Admin_Dose_1_Day_Rolling_Average | 2280 | 0.97 | 12874.30 | 88699.12 | -326573 | 245.0 | 1280.5 | 5744.25 | 5011136.0 | ▇▁▁▁▁ |
Administered_Dose1_Pop_Pct | 0 | 1.00 | 59.11 | 25.71 | 0 | 46.7 | 64.4 | 77.00 | 100.0 | ▂▂▅▇▅ |
Administered_daily_change_report | 16740 | 0.80 | 17742.62 | 129779.67 | 0 | 0.0 | 1.0 | 6758.00 | 4575683.0 | ▇▁▁▁▁ |
Administered_daily_change_report_7dayroll | 17820 | 0.79 | 35692.98 | 171664.92 | -138218 | 2048.0 | 7787.5 | 20553.00 | 3506170.0 | ▇▁▁▁▁ |
Series_Complete_Daily | 0 | 1.00 | 10904.38 | 123334.70 | -523379 | 101.0 | 761.0 | 4124.00 | 28709428.0 | ▇▁▁▁▁ |
Series_Complete_Cumulative | 0 | 1.00 | 5561849.68 | 23535246.70 | 0 | 347879.0 | 1467425.0 | 3790923.75 | 230142115.0 | ▇▁▁▁▁ |
Series_Complete_Day_Rolling_Average | 2280 | 0.97 | 11005.08 | 80152.04 | -71931 | 160.0 | 906.0 | 4492.25 | 4837589.0 | ▇▁▁▁▁ |
Series_Complete_Pop_Pct | 0 | 1.00 | 49.73 | 24.22 | 0 | 38.0 | 55.7 | 66.50 | 90.6 | ▃▂▅▇▃ |
Booster_Daily | 0 | 1.00 | 5577.35 | 47225.59 | -751692 | 0.0 | 60.5 | 1281.00 | 2748886.0 | ▁▇▁▁▁ |
Booster_Cumulative | 0 | 1.00 | 1730063.67 | 9382469.98 | 0 | 0.0 | 71530.5 | 1005061.75 | 117621762.0 | ▇▁▁▁▁ |
Booster_7_Day_Rolling_Average | 2280 | 0.97 | 5412.89 | 42074.55 | -2097 | 0.0 | 107.0 | 1382.00 | 1257266.0 | ▇▁▁▁▁ |
Additional_Doses_Vax_Pct | 0 | 1.00 | 23.49 | 23.17 | 0 | 0.0 | 20.0 | 46.20 | 67.5 | ▇▁▂▅▂ |
Second_Booster_50Plus_Daily | 0 | 1.00 | 1716.21 | 44865.68 | -40176 | 0.0 | 0.0 | 124.00 | 11835302.0 | ▇▁▁▁▁ |
Second_Booster_50Plus_Cumulative | 0 | 1.00 | 233289.99 | 1814678.63 | 0 | 0.0 | 0.0 | 36473.50 | 36171359.0 | ▇▁▁▁▁ |
Second_Booster_50Plus_7_Day_Rolling_Average | 2280 | 0.97 | 1264.51 | 18881.48 | -170 | 0.0 | 0.0 | 131.00 | 1855467.0 | ▇▁▁▁▁ |
Second_Booster_50Plus_Vax_Pct | 0 | 1.00 | 10.05 | 17.47 | 0 | 0.0 | 0.0 | 16.10 | 67.2 | ▇▁▁▁▁ |
Bivalent_Booster_Daily | 0 | 1.00 | 2560.51 | 67066.01 | -1 | 0.0 | 0.0 | 0.00 | 14759770.0 | ▇▁▁▁▁ |
Bivalent_Booster_Cumulative | 0 | 1.00 | 182246.09 | 2057578.02 | 0 | 0.0 | 0.0 | 0.00 | 53980763.0 | ▇▁▁▁▁ |
Bivalent_Booster_7_Day_Rolling_Average | 2280 | 0.97 | 1314.40 | 17268.64 | 0 | 0.0 | 0.0 | 0.00 | 615222.0 | ▇▁▁▁▁ |
Bivalent_Booster_Pop_Pct | 0 | 1.00 | 1.65 | 4.87 | 0 | 0.0 | 0.0 | 0.00 | 32.8 | ▇▁▁▁▁ |
Data 3
Introduction and data
- Identify the source of the data.
- The source of the data will come from www.metacritic.com.
- State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
- We scraped the website to collect the data.
- Write a brief description of the observations.
- The observations are video games that have been ranked on metacritic. Observations include the video game title, developer, release date, scores, etc.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
How are games rated based on who developed them?
Have their been changes in the scores of different games over the years?
How does the platform the game is on affect the scores of the games?
Do the top games of all time (games with high critic scores) also have high user scores?
What variables have influenced the utilization of the Reddit API over time, and how has it changed?
What variables have influenced the utilization of the Reddit API over time, and how has it changed?
A description of the research topic along with a concise statement of your hypotheses on this topic.
- This data was web scraped from metacritic. Using the video game’s information (title, developer, platform, date, critic scores, user scores, etc.) we can find relationships between these variables to formulate data visualization that show interesting patterns.
Glimpse of data
paths_allowed("https://www.metacritic.com/browse/games/score/metascore/all/all/filtered?sort=desc")
<- function(url){
scrape_title #pause for a couple of seconds
Sys.sleep(2)
# read the first page
<- read_html(url)
page # extract desired components
<- html_elements(x = page, css = ".title h3") |>
title html_text2() |>
tolower() |>
str_replace_all(" ", "-") |>
str_replace_all("[':]", "") |>
str_replace_all("[.]", "") |>
str_replace_all("-&", "") |>
str_replace_all("[(]", "") |>
str_replace_all("[)]", "") |>
str_replace_all("-/", "")
<- html_elements(
platform x = page,
css = ".platform .data") |>
html_text2() |>
tolower() |>
str_replace_all(" ", "-")
<- html_elements(
user_score x = page,
css = ".user") |>
html_text2()
<- html_elements(
metascore x = page,
css = ".clamp-metascore .positive") |>
html_text2()
# create a tibble with this data
<- tibble(
game_raw title = title,
platform = platform,
user_score = user_score,
metascore = metascore, )
}<- 0:5
page_nums <- str_glue("https://www.metacritic.com/browse/games/score/metascore/all/all/filtered?sort=desc&page={page_nums}")
title_urls <- map(.x = title_urls, .f = scrape_title) |> list_rbind()
meta_titles write_csv(x = meta_titles, file = "data/meta-titles.csv")
<- function(url){
scrape_game #pause for a couple of seconds
Sys.sleep(2)
# read the page
<- read_html(url)
page
# extract desired components
<- html_elements(x = page, css = "h1") |> html_text2()
titles
<- html_elements(
platform x = page,
css = ".platform"
|>
) html_text2()
<- html_elements(
developers x = page,
css = ".developer") |>
html_text2()
<- html_elements(
release_dates x = page,
css = ".release_data .data") |>
html_text2()
<- html_elements(
genres x = page,
css = ".product_genre") |>
html_text2()
<- html_elements(
num_players x = page,
css = ".product_players .data") |>
html_text2()
<- html_elements(
age_rating x = page,
css = ".product_rating .data"
|>
) html_text2()
<- html_elements(
num_critic_reviews x = page,
css = ".count a span"
|>
) html_text2()
<- html_elements(
num_user_reviews x = page,
css = ".feature_userscore .count a"
|>
) html_text2()
if(length(titles) == 0) { titles = NA }
if(length(platform) == 0) { platform = NA }
if(length(release_dates) == 0) { release_dates = NA }
if(length(genres) == 0) { genres = NA }
if(length(num_players) == 0) { num_players = NA }
if(length(age_rating) == 0) { age_rating = NA }
if(length(num_critic_reviews) == 0) { num_critic_reviews = NA }
if(length(num_user_reviews) == 0) { num_user_reviews = NA }
# create a tibble with this data
<- tibble(
game_raw title = titles,
platform = platform,
developer = developers,
date = release_dates,
genre = genres,
number_of_players = num_players,
rating = age_rating,
critic_reviews = num_critic_reviews,
user_reviews = num_user_reviews,
)
# clean up the data
#game_raw |> mutate(date = mdy(date))
}
<- meta_titles$title[1:600]
url_titles <- meta_titles$platform[1:600]
url_platforms <- str_glue("https://www.metacritic.com/game/{url_platforms}/{url_titles}")
meta_urls <- map(.x = meta_urls, .f = scrape_game) |> list_rbind()
meta_games
write_csv(x = meta_games, file = "data/meta.csv")
<- read_csv("data/meta.csv") games
Rows: 600 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): title, platform, developer, date, genre, number_of_players, rating,...
dbl (1): critic_reviews
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
<- read_csv("data/meta-titles.csv") games_scores
Rows: 600 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): title, platform
dbl (2): user_score, metascore
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
::skim(games) skimr
Name | games |
Number of rows | 600 |
Number of columns | 9 |
_______________________ | |
Column type frequency: | |
character | 8 |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
title | 0 | 1.00 | 3 | 68 | 0 | 435 | 0 |
platform | 0 | 1.00 | 2 | 16 | 0 | 21 | 0 |
developer | 0 | 1.00 | 15 | 65 | 0 | 220 | 0 |
date | 0 | 1.00 | 11 | 12 | 0 | 473 | 0 |
genre | 0 | 1.00 | 25 | 127 | 0 | 198 | 0 |
number_of_players | 72 | 0.88 | 1 | 21 | 0 | 31 | 0 |
rating | 27 | 0.96 | 1 | 4 | 0 | 6 | 0 |
user_reviews | 0 | 1.00 | 9 | 14 | 0 | 495 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
critic_reviews | 0 | 1 | 39.88 | 27.24 | 7 | 18 | 31 | 59 | 141 | ▇▃▂▁▁ |
::skim(games_scores) skimr
Name | games_scores |
Number of rows | 600 |
Number of columns | 4 |
_______________________ | |
Column type frequency: | |
character | 2 |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
title | 0 | 1 | 3 | 66 | 0 | 434 | 0 |
platform | 0 | 1 | 2 | 16 | 0 | 21 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
user_score | 0 | 1 | 8.22 | 0.84 | 3.3 | 7.9 | 8.4 | 8.8 | 9.7 | ▁▁▁▇▇ |
metascore | 0 | 1 | 91.93 | 2.05 | 89.0 | 90.0 | 91.0 | 93.0 | 99.0 | ▇▅▂▁▁ |