Top Video Games

Proposal

library(tidyverse)
library(skimr)
library(rvest)
library(robotstxt)
library(openintro)
library(lubridate)

Data 1

Introduction and data

  • The source of the data is the FiveThirtyEight api from https://data.fivethirtyeight.com/

  • FiveThirtyEight collects data by compiling polling data from other organizations that pass a minimum bar of being representative, professional, and without bias. Our specific data was collected from a poll conducted by Ipsos for FiveThirtyEight from September 15 to 25.

  • Observations are people who have been eligible to vote for at least four elections and include voting files as well as their voting frequency.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.

    • Why do many Americans not vote?

    • What is the relationship between a respondent’s voting status and one’s residential region?

    • Is there a correlation between respondent’s age and a respondent’s voting status?

  • A description of the research topic along with a concise statement of your hypotheses on this topic.

    • This data is sourced from “fivethirtyeight”. Using respondent’s information:name, age, zip code, and more, one is able to compare the 5,239 respondent’s information to their voting status. The relationship between a respondent’s name and frequency of vote will help visually predict the trends of why one votes and one does not.

    • We estimate that there will be less activity in voting status at younger ages, specifically under 25 years older. Ages older, specifically 65 years old and older, will be more active and have a higher frequency and history of votes.

  • Identify the types of variables in your research question. Categorical? Quantitative?

    • Categorical: name, last name, address

    • Quantitative: age, zip code

Glimpse of data

# add code here
vote_data <- read.csv("data/nonvoters_data.csv")
skimr::skim(vote_data)
Data summary
Name vote_data
Number of rows 5836
Number of columns 119
_______________________
Column type frequency:
character 5
numeric 114
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
educ 0 1 7 19 0 3 0
race 0 1 5 11 0 4 0
gender 0 1 4 6 0 2 0
income_cat 0 1 7 14 0 4 0
voter_category 0 1 6 12 0 3 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
RespId 0 1.00 474654.00 3628.48 470001.00 472069.75 474152.00 476217.50 488325.00 ▇▇▁▁▁
weight 0 1.00 0.99 0.35 0.23 0.79 0.97 1.17 3.04 ▃▇▁▁▁
Q1 0 1.00 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁
Q2_1 0 1.00 1.25 0.66 -1.00 1.00 1.00 1.00 4.00 ▁▇▁▁▁
Q2_2 0 1.00 1.71 0.87 -1.00 1.00 2.00 2.00 4.00 ▁▇▆▂▁
Q2_3 0 1.00 1.64 0.77 -1.00 1.00 2.00 2.00 4.00 ▁▇▇▁▁
Q2_4 0 1.00 2.18 1.09 -1.00 1.00 2.00 3.00 4.00 ▁▇▆▆▃
Q2_5 0 1.00 1.28 0.63 -1.00 1.00 1.00 1.00 4.00 ▁▇▂▁▁
Q2_6 0 1.00 1.81 1.01 -1.00 1.00 1.00 2.00 4.00 ▁▇▃▂▂
Q2_7 0 1.00 1.49 0.81 -1.00 1.00 1.00 2.00 4.00 ▁▇▃▁▁
Q2_8 0 1.00 1.46 0.67 -1.00 1.00 1.00 2.00 4.00 ▁▇▅▁▁
Q2_9 0 1.00 2.09 1.25 -1.00 1.00 2.00 3.00 4.00 ▁▇▃▂▃
Q2_10 0 1.00 2.02 0.95 -1.00 1.00 2.00 3.00 4.00 ▁▆▇▃▂
Q3_1 0 1.00 1.90 1.07 -1.00 1.00 2.00 3.00 4.00 ▁▇▅▂▂
Q3_2 0 1.00 2.38 1.19 -1.00 1.00 2.00 3.00 4.00 ▁▇▆▆▇
Q3_3 0 1.00 2.63 1.12 -1.00 2.00 3.00 4.00 4.00 ▁▅▇▇▇
Q3_4 0 1.00 1.86 0.99 -1.00 1.00 2.00 2.00 4.00 ▁▇▆▃▂
Q3_5 0 1.00 2.02 0.87 -1.00 1.00 2.00 3.00 4.00 ▁▅▇▃▁
Q3_6 0 1.00 2.07 0.97 -1.00 1.00 2.00 3.00 4.00 ▁▆▇▃▂
Q4_1 0 1.00 1.80 0.92 -1.00 1.00 2.00 2.00 4.00 ▁▇▆▂▁
Q4_2 0 1.00 1.83 0.89 -1.00 1.00 2.00 2.00 4.00 ▁▇▇▃▁
Q4_3 0 1.00 2.00 0.93 -1.00 1.00 2.00 3.00 4.00 ▁▆▇▃▂
Q4_4 0 1.00 2.43 1.03 -1.00 2.00 2.00 3.00 4.00 ▁▅▇▆▅
Q4_5 0 1.00 2.34 1.04 -1.00 2.00 2.00 3.00 4.00 ▁▅▇▅▃
Q4_6 0 1.00 2.03 0.97 -1.00 1.00 2.00 3.00 4.00 ▁▇▇▅▂
Q5 0 1.00 1.16 0.41 -1.00 1.00 1.00 1.00 2.00 ▁▁▁▇▂
Q6 0 1.00 2.85 0.87 -1.00 2.00 3.00 3.00 4.00 ▁▁▅▇▃
Q7 0 1.00 1.17 0.45 -1.00 1.00 1.00 1.00 2.00 ▁▁▁▇▂
Q8_1 0 1.00 2.75 1.17 -1.00 2.00 3.00 4.00 4.00 ▁▃▆▅▇
Q8_2 0 1.00 2.75 0.88 -1.00 2.00 3.00 3.00 4.00 ▁▁▇▇▅
Q8_3 0 1.00 2.12 0.84 -1.00 2.00 2.00 3.00 4.00 ▁▃▇▃▁
Q8_4 0 1.00 2.00 0.90 -1.00 1.00 2.00 2.00 4.00 ▁▅▇▃▁
Q8_5 0 1.00 2.46 0.89 -1.00 2.00 2.00 3.00 4.00 ▁▂▇▆▂
Q8_6 0 1.00 2.23 0.89 -1.00 2.00 2.00 3.00 4.00 ▁▃▇▃▂
Q8_7 0 1.00 2.79 1.00 -1.00 2.00 3.00 4.00 4.00 ▁▂▇▆▇
Q8_8 0 1.00 1.99 0.91 -1.00 1.00 2.00 2.00 4.00 ▁▆▇▃▂
Q8_9 0 1.00 1.81 0.82 -1.00 1.00 2.00 2.00 4.00 ▁▆▇▂▁
Q9_1 0 1.00 1.52 0.77 -1.00 1.00 1.00 2.00 4.00 ▁▇▅▁▁
Q9_2 0 1.00 2.54 1.04 -1.00 2.00 2.00 3.00 4.00 ▁▃▇▆▅
Q9_3 0 1.00 3.19 1.03 -1.00 3.00 4.00 4.00 4.00 ▁▁▂▅▇
Q9_4 0 1.00 3.44 0.94 -1.00 3.00 4.00 4.00 4.00 ▁▁▁▂▇
Q10_1 0 1.00 1.91 0.32 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
Q10_2 0 1.00 1.78 0.44 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▂▇
Q10_3 0 1.00 1.82 0.42 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▂▇
Q10_4 0 1.00 1.98 0.18 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
Q11_1 0 1.00 1.86 0.38 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
Q11_2 0 1.00 1.97 0.24 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
Q11_3 0 1.00 1.60 0.51 -1.00 1.00 2.00 2.00 2.00 ▁▁▁▅▇
Q11_4 0 1.00 1.86 0.37 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
Q11_5 0 1.00 1.77 0.44 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▂▇
Q11_6 0 1.00 1.97 0.23 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
Q14 0 1.00 2.65 1.74 -1.00 1.00 2.00 5.00 5.00 ▁▇▅▃▇
Q15 0 1.00 2.21 1.52 -1.00 1.00 2.00 3.00 5.00 ▁▇▅▂▅
Q16 0 1.00 1.62 0.84 -1.00 1.00 1.00 2.00 4.00 ▁▇▅▂▁
Q17_1 0 1.00 1.64 0.82 -1.00 1.00 1.00 2.00 4.00 ▁▇▆▁▁
Q17_2 0 1.00 1.68 0.88 -1.00 1.00 2.00 2.00 4.00 ▁▇▆▂▁
Q17_3 0 1.00 2.32 1.13 -1.00 1.00 2.00 3.00 4.00 ▁▇▇▅▅
Q17_4 0 1.00 2.71 1.07 -1.00 2.00 3.00 4.00 4.00 ▁▃▇▇▇
Q18_1 0 1.00 1.95 0.32 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
Q18_2 0 1.00 1.92 0.36 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
Q18_3 0 1.00 1.92 0.35 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
Q18_4 0 1.00 1.93 0.35 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
Q18_5 0 1.00 1.96 0.31 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
Q18_6 0 1.00 1.94 0.33 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
Q18_7 0 1.00 1.88 0.41 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
Q18_8 0 1.00 1.78 0.47 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▂▇
Q18_9 0 1.00 1.93 0.34 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
Q18_10 0 1.00 1.92 0.36 -1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
Q19_1 0 1.00 -0.42 0.91 -1.00 -1.00 -1.00 1.00 1.00 ▇▁▁▁▃
Q19_2 0 1.00 0.03 1.00 -1.00 -1.00 1.00 1.00 1.00 ▇▁▁▁▇
Q19_3 0 1.00 0.05 1.00 -1.00 -1.00 1.00 1.00 1.00 ▇▁▁▁▇
Q19_4 0 1.00 -0.23 0.97 -1.00 -1.00 -1.00 1.00 1.00 ▇▁▁▁▅
Q19_5 0 1.00 -0.20 0.98 -1.00 -1.00 -1.00 1.00 1.00 ▇▁▁▁▆
Q19_6 0 1.00 0.07 1.00 -1.00 -1.00 1.00 1.00 1.00 ▇▁▁▁▇
Q19_7 0 1.00 -0.25 0.97 -1.00 -1.00 -1.00 1.00 1.00 ▇▁▁▁▅
Q19_8 0 1.00 -0.43 0.90 -1.00 -1.00 -1.00 1.00 1.00 ▇▁▁▁▃
Q19_9 0 1.00 -0.51 0.86 -1.00 -1.00 -1.00 -1.00 1.00 ▇▁▁▁▂
Q19_10 0 1.00 -0.89 0.45 -1.00 -1.00 -1.00 -1.00 1.00 ▇▁▁▁▁
Q20 0 1.00 1.08 0.30 -1.00 1.00 1.00 1.00 2.00 ▁▁▁▇▁
Q21 0 1.00 1.21 0.57 -1.00 1.00 1.00 1.00 3.00 ▁▁▇▁▁
Q22 5350 0.08 4.21 2.11 -1.00 2.00 4.00 6.00 7.00 ▁▆▁▆▇
Q23 0 1.00 1.80 0.80 -1.00 1.00 2.00 2.00 3.00 ▁▁▆▇▃
Q24 0 1.00 1.97 0.98 -1.00 1.00 2.00 3.00 4.00 ▁▇▅▇▁
Q25 0 1.00 1.84 0.93 -1.00 1.00 2.00 2.00 4.00 ▁▇▇▃▁
Q26 0 1.00 1.45 0.96 -1.00 1.00 1.00 1.00 4.00 ▁▇▁▁▁
Q27_1 0 1.00 1.25 0.51 -1.00 1.00 1.00 2.00 2.00 ▁▁▁▇▃
Q27_2 0 1.00 1.15 0.44 -1.00 1.00 1.00 1.00 2.00 ▁▁▁▇▂
Q27_3 0 1.00 1.31 0.55 -1.00 1.00 1.00 2.00 2.00 ▁▁▁▇▅
Q27_4 0 1.00 1.21 0.47 -1.00 1.00 1.00 1.00 2.00 ▁▁▁▇▂
Q27_5 0 1.00 1.36 0.56 -1.00 1.00 1.00 2.00 2.00 ▁▁▁▇▅
Q27_6 0 1.00 1.27 0.50 -1.00 1.00 1.00 2.00 2.00 ▁▁▁▇▃
Q28_1 534 0.91 0.53 0.85 -1.00 1.00 1.00 1.00 1.00 ▂▁▁▁▇
Q28_2 534 0.91 -0.27 0.96 -1.00 -1.00 -1.00 1.00 1.00 ▇▁▁▁▅
Q28_3 534 0.91 -0.27 0.96 -1.00 -1.00 -1.00 1.00 1.00 ▇▁▁▁▅
Q28_4 534 0.91 -0.30 0.95 -1.00 -1.00 -1.00 1.00 1.00 ▇▁▁▁▅
Q28_5 534 0.91 -0.28 0.96 -1.00 -1.00 -1.00 1.00 1.00 ▇▁▁▁▅
Q28_6 534 0.91 -0.43 0.90 -1.00 -1.00 -1.00 1.00 1.00 ▇▁▁▁▃
Q28_7 534 0.91 -0.26 0.97 -1.00 -1.00 -1.00 1.00 1.00 ▇▁▁▁▅
Q28_8 534 0.91 -0.95 0.30 -1.00 -1.00 -1.00 -1.00 1.00 ▇▁▁▁▁
Q29_1 4494 0.23 -0.46 0.89 -1.00 -1.00 -1.00 1.00 1.00 ▇▁▁▁▃
Q29_2 4494 0.23 -0.76 0.65 -1.00 -1.00 -1.00 -1.00 1.00 ▇▁▁▁▁
Q29_3 4494 0.23 -0.40 0.92 -1.00 -1.00 -1.00 1.00 1.00 ▇▁▁▁▃
Q29_4 4494 0.23 -0.60 0.80 -1.00 -1.00 -1.00 -1.00 1.00 ▇▁▁▁▂
Q29_5 4494 0.23 -0.66 0.75 -1.00 -1.00 -1.00 -1.00 1.00 ▇▁▁▁▂
Q29_6 4494 0.23 -0.93 0.38 -1.00 -1.00 -1.00 -1.00 1.00 ▇▁▁▁▁
Q29_7 4494 0.23 -0.76 0.65 -1.00 -1.00 -1.00 -1.00 1.00 ▇▁▁▁▁
Q29_8 4494 0.23 -0.70 0.72 -1.00 -1.00 -1.00 -1.00 1.00 ▇▁▁▁▂
Q29_9 4494 0.23 -0.81 0.59 -1.00 -1.00 -1.00 -1.00 1.00 ▇▁▁▁▁
Q29_10 4494 0.23 -0.70 0.71 -1.00 -1.00 -1.00 -1.00 1.00 ▇▁▁▁▂
Q30 0 1.00 2.33 1.26 -1.00 1.00 2.00 3.00 5.00 ▁▆▇▆▃
Q31 4244 0.27 1.36 0.52 -1.00 1.00 1.00 2.00 2.00 ▁▁▁▇▅
Q32 3834 0.34 1.37 0.50 -1.00 1.00 1.00 2.00 2.00 ▁▁▁▇▅
Q33 3594 0.38 1.22 0.96 -1.00 1.00 1.00 2.00 2.00 ▂▁▁▇▇
ppage 0 1.00 51.69 17.07 22.00 36.00 54.00 65.00 94.00 ▆▅▇▆▁

Data 2

Introduction and data

  • Identify the source of the data.

    • The CDC “Center for Disease and Control”
  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

    • It was collected from December 14th, 2020 till March 7th, 2023, and the data was collected from the CDC workers who administered the vaccine doses to civilians.
  • Write a brief description of the observations.

    • From the data observations, there was a rise in the number of daily vaccine doses from the months of December through April when quarantine had been at its highest. But then it significantly dropped from May to July because it was almost half the doses people were getting. But then it started to stabilize after July but never reached its peak again.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
    • How has the number of administered vaccinations (dose 1, dose 2, booster) changed over time?

    • How has the daily number of administered vaccinations (dose 1, dose 2, booster) changed over time by state?

  • A description of the research topic along with a concise statement of your hypotheses on this topic.
    • Research with this dataset would delve into trends regarding the number of administered COVID-19 vaccinations across the United States. Another potential avenue of research is to join this dataset with another relevant one provided by the CDC in order to investigate different variables related to vaccination trends.

    • We hypothesize that the number of administered dose 1 COVID-19 vaccinations will display a decreasing trend over time (especially as the United States lifted restrictions) because more pro-vaccine individuals would have already gotten their first dose earlier.

  • Identify the types of variables in your research question. Categorical? Quantitative?
    • Categorical: location, date, date type

    • Quantitative: The number of administered vaccinations (cumulative and daily, by location)

Glimpse of data

# add code here
cdc_data <- read.csv("data/cdc-covid-19-vaccinations.csv")
skimr::skim(cdc_data)
Data summary
Name cdc_data
Number of rows 84240
Number of columns 29
_______________________
Column type frequency:
character 3
numeric 26
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Date 0 1 10 10 0 816 0
date_type 0 1 5 6 0 2 0
Location 0 1 2 2 0 60 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
MMWR_week 0 1.00 24.68 15.80 1 10.0 23.0 38.00 53.0 ▇▆▆▅▆
Administered_Daily 0 1.00 31836.78 170917.12 -1593072 893.0 5071.0 17215.25 5610894.0 ▁▇▁▁▁
Administered_Cumulative 0 1.00 14021962.56 60092860.31 0 930069.8 3465873.5 9412128.75 672537312.0 ▇▁▁▁▁
Administered_7_Day_Rolling_Average 2280 0.97 30765.17 156108.89 -138218 1530.0 5953.0 17195.25 3506170.0 ▇▁▁▁▁
Admin_Dose_1_Daily 0 1.00 12774.09 134918.19 -2468411 175.0 1084.5 5339.00 30150525.0 ▇▁▁▁▁
Admin_Dose_1_Cumulative 0 1.00 6620169.69 27594866.18 0 497187.5 1807394.5 4273760.00 269650596.0 ▇▁▁▁▁
Admin_Dose_1_Day_Rolling_Average 2280 0.97 12874.30 88699.12 -326573 245.0 1280.5 5744.25 5011136.0 ▇▁▁▁▁
Administered_Dose1_Pop_Pct 0 1.00 59.11 25.71 0 46.7 64.4 77.00 100.0 ▂▂▅▇▅
Administered_daily_change_report 16740 0.80 17742.62 129779.67 0 0.0 1.0 6758.00 4575683.0 ▇▁▁▁▁
Administered_daily_change_report_7dayroll 17820 0.79 35692.98 171664.92 -138218 2048.0 7787.5 20553.00 3506170.0 ▇▁▁▁▁
Series_Complete_Daily 0 1.00 10904.38 123334.70 -523379 101.0 761.0 4124.00 28709428.0 ▇▁▁▁▁
Series_Complete_Cumulative 0 1.00 5561849.68 23535246.70 0 347879.0 1467425.0 3790923.75 230142115.0 ▇▁▁▁▁
Series_Complete_Day_Rolling_Average 2280 0.97 11005.08 80152.04 -71931 160.0 906.0 4492.25 4837589.0 ▇▁▁▁▁
Series_Complete_Pop_Pct 0 1.00 49.73 24.22 0 38.0 55.7 66.50 90.6 ▃▂▅▇▃
Booster_Daily 0 1.00 5577.35 47225.59 -751692 0.0 60.5 1281.00 2748886.0 ▁▇▁▁▁
Booster_Cumulative 0 1.00 1730063.67 9382469.98 0 0.0 71530.5 1005061.75 117621762.0 ▇▁▁▁▁
Booster_7_Day_Rolling_Average 2280 0.97 5412.89 42074.55 -2097 0.0 107.0 1382.00 1257266.0 ▇▁▁▁▁
Additional_Doses_Vax_Pct 0 1.00 23.49 23.17 0 0.0 20.0 46.20 67.5 ▇▁▂▅▂
Second_Booster_50Plus_Daily 0 1.00 1716.21 44865.68 -40176 0.0 0.0 124.00 11835302.0 ▇▁▁▁▁
Second_Booster_50Plus_Cumulative 0 1.00 233289.99 1814678.63 0 0.0 0.0 36473.50 36171359.0 ▇▁▁▁▁
Second_Booster_50Plus_7_Day_Rolling_Average 2280 0.97 1264.51 18881.48 -170 0.0 0.0 131.00 1855467.0 ▇▁▁▁▁
Second_Booster_50Plus_Vax_Pct 0 1.00 10.05 17.47 0 0.0 0.0 16.10 67.2 ▇▁▁▁▁
Bivalent_Booster_Daily 0 1.00 2560.51 67066.01 -1 0.0 0.0 0.00 14759770.0 ▇▁▁▁▁
Bivalent_Booster_Cumulative 0 1.00 182246.09 2057578.02 0 0.0 0.0 0.00 53980763.0 ▇▁▁▁▁
Bivalent_Booster_7_Day_Rolling_Average 2280 0.97 1314.40 17268.64 0 0.0 0.0 0.00 615222.0 ▇▁▁▁▁
Bivalent_Booster_Pop_Pct 0 1.00 1.65 4.87 0 0.0 0.0 0.00 32.8 ▇▁▁▁▁

Data 3

Introduction and data

  • Identify the source of the data.
    • The source of the data will come from www.metacritic.com.
  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
    • We scraped the website to collect the data.
  • Write a brief description of the observations.
    • The observations are video games that have been ranked on metacritic. Observations include the video game title, developer, release date, scores, etc.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

    • How are games rated based on who developed them?

    • Have their been changes in the scores of different games over the years?

    • How does the platform the game is on affect the scores of the games?

    • Do the top games of all time (games with high critic scores) also have high user scores?

    • What variables have influenced the utilization of the Reddit API over time, and how has it changed?

    • What variables have influenced the utilization of the Reddit API over time, and how has it changed?

  • A description of the research topic along with a concise statement of your hypotheses on this topic.

    • This data was web scraped from metacritic. Using the video game’s information (title, developer, platform, date, critic scores, user scores, etc.) we can find relationships between these variables to formulate data visualization that show interesting patterns.

Glimpse of data

paths_allowed("https://www.metacritic.com/browse/games/score/metascore/all/all/filtered?sort=desc")
scrape_title <- function(url){
  #pause for a couple of seconds 
  Sys.sleep(2)
  # read the first page 
  page <- read_html(url)
  # extract desired components  
  title <- html_elements(x = page, css = ".title h3") |> 
    html_text2() |> 
    tolower() |> 
    str_replace_all(" ", "-") |> 
    str_replace_all("[':]", "") |>
    str_replace_all("[.]", "") |>
    str_replace_all("-&", "") |>
    str_replace_all("[(]", "") |>
    str_replace_all("[)]", "") |>
    str_replace_all("-/", "")
  
  platform <- html_elements(
    x = page, 
    css = ".platform .data") |> 
    html_text2() |> 
    tolower() |>
    str_replace_all(" ", "-") 
  
  user_score <- html_elements(
    x = page, 
    css = ".user") |> 
    html_text2()
  
  metascore <- html_elements(
    x = page, 
    css = ".clamp-metascore .positive") |> 
    html_text2()
  
  # create a tibble with this data 
  game_raw <- tibble(
    title = title, 
    platform = platform, 
    user_score = user_score, 
    metascore = metascore, )
}
page_nums <- 0:5
title_urls <- str_glue("https://www.metacritic.com/browse/games/score/metascore/all/all/filtered?sort=desc&page={page_nums}")
meta_titles <- map(.x = title_urls, .f = scrape_title) |> list_rbind()
write_csv(x = meta_titles, file = "data/meta-titles.csv")
scrape_game <- function(url){
#pause for a couple of seconds 
  Sys.sleep(2)
  
# read the page 
  page <- read_html(url)
  
# extract desired components 
  titles <- html_elements(x = page, css = "h1") |> html_text2()
  
  platform <- html_elements(
    x = page,
    css = ".platform"
  ) |>
    html_text2()
developers <- html_elements(
  x = page, 
  css = ".developer") |> 
  html_text2()
release_dates <- html_elements(
  x = page, 
  css = ".release_data .data") |> 
  html_text2()
genres <- html_elements(
  x = page, 
  css = ".product_genre") |> 
  html_text2()
num_players <- html_elements(
  x = page, 
  css = ".product_players .data") |> 
  html_text2() 
age_rating <- html_elements(
  x = page,
  css = ".product_rating .data"
) |>
  html_text2()
num_critic_reviews <- html_elements(
  x = page,
  css = ".count a span"
) |>
  html_text2()
num_user_reviews <- html_elements(
  x = page,
  css = ".feature_userscore .count a"
) |>
  html_text2()
if(length(titles) == 0) { titles = NA }
if(length(platform) == 0) { platform = NA }
if(length(release_dates) == 0) { release_dates = NA }
if(length(genres) == 0) { genres = NA }
if(length(num_players) == 0) { num_players = NA }
if(length(age_rating) == 0) { age_rating = NA }
if(length(num_critic_reviews) == 0) { num_critic_reviews = NA }
if(length(num_user_reviews) == 0) { num_user_reviews = NA }

# create a tibble with this data 

game_raw <- tibble(
  title = titles, 
  platform = platform,
  developer = developers, 
  date = release_dates, 
  genre = genres,
  number_of_players = num_players,
  rating = age_rating,
  critic_reviews = num_critic_reviews,
  user_reviews = num_user_reviews,
  )

# clean up the data 
#game_raw |> mutate(date = mdy(date)) 
}

url_titles <- meta_titles$title[1:600]
url_platforms <- meta_titles$platform[1:600]
meta_urls <- str_glue("https://www.metacritic.com/game/{url_platforms}/{url_titles}") 
meta_games <- map(.x = meta_urls, .f = scrape_game) |> list_rbind()


write_csv(x = meta_games, file = "data/meta.csv")
games <- read_csv("data/meta.csv")
Rows: 600 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): title, platform, developer, date, genre, number_of_players, rating,...
dbl (1): critic_reviews

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
games_scores <- read_csv("data/meta-titles.csv")
Rows: 600 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): title, platform
dbl (2): user_score, metascore

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skimr::skim(games)
Data summary
Name games
Number of rows 600
Number of columns 9
_______________________
Column type frequency:
character 8
numeric 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
title 0 1.00 3 68 0 435 0
platform 0 1.00 2 16 0 21 0
developer 0 1.00 15 65 0 220 0
date 0 1.00 11 12 0 473 0
genre 0 1.00 25 127 0 198 0
number_of_players 72 0.88 1 21 0 31 0
rating 27 0.96 1 4 0 6 0
user_reviews 0 1.00 9 14 0 495 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
critic_reviews 0 1 39.88 27.24 7 18 31 59 141 ▇▃▂▁▁
skimr::skim(games_scores)
Data summary
Name games_scores
Number of rows 600
Number of columns 4
_______________________
Column type frequency:
character 2
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
title 0 1 3 66 0 434 0
platform 0 1 2 16 0 21 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
user_score 0 1 8.22 0.84 3.3 7.9 8.4 8.8 9.7 ▁▁▁▇▇
metascore 0 1 91.93 2.05 89.0 90.0 91.0 93.0 99.0 ▇▅▂▁▁