Data 1

Introduction and data

The source of the data is the FiveThirtyEight api from https://data.fivethirtyeight.com/
FiveThirtyEight collects data by compiling polling data from other organizations that pass a minimum bar of being representative, professional, and without bias. Our specific data was collected from a poll conducted by Ipsos for FiveThirtyEight from September 15 to 25.
Observations are people who have been eligible to vote for at least four elections and include voting files as well as their voting frequency.

Research question

A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.
- Why do many Americans not vote?
- What is the relationship between a respondent’s voting status and one’s residential region?
- Is there a correlation between respondent’s age and a respondent’s voting status?
A description of the research topic along with a concise statement of your hypotheses on this topic.
- This data is sourced from “fivethirtyeight”. Using respondent’s information:name, age, zip code, and more, one is able to compare the 5,239 respondent’s information to their voting status. The relationship between a respondent’s name and frequency of vote will help visually predict the trends of why one votes and one does not.
- We estimate that there will be less activity in voting status at younger ages, specifically under 25 years older. Ages older, specifically 65 years old and older, will be more active and have a higher frequency and history of votes.
Identify the types of variables in your research question. Categorical? Quantitative?
- Categorical: name, last name, address
- Quantitative: age, zip code

Glimpse of data

# add code here
vote_data <- read.csv("data/nonvoters_data.csv")
skimr::skim(vote_data)

Data summary
Name	vote_data
Number of rows	5836
Number of columns	119
_______________________
Column type frequency:
character	5
numeric	114
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
educ	1	7	19	3
race	1	5	11	4
gender	1	4	6	2
income_cat	1	7	14	4
voter_category	1	6	12	3

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
RespId	0	1.00	474654.00	3628.48	470001.00	472069.75	474152.00	476217.50	488325.00	▇▇▁▁▁
weight	0	1.00	0.99	0.35	0.23	0.79	0.97	1.17	3.04	▃▇▁▁▁
Q1	0	1.00	1.00	0.00	1.00	1.00	1.00	1.00	1.00	▁▁▇▁▁
Q2_1	0	1.00	1.25	0.66	-1.00	1.00	1.00	1.00	4.00	▁▇▁▁▁
Q2_2	0	1.00	1.71	0.87	-1.00	1.00	2.00	2.00	4.00	▁▇▆▂▁
Q2_3	0	1.00	1.64	0.77	-1.00	1.00	2.00	2.00	4.00	▁▇▇▁▁
Q2_4	0	1.00	2.18	1.09	-1.00	1.00	2.00	3.00	4.00	▁▇▆▆▃
Q2_5	0	1.00	1.28	0.63	-1.00	1.00	1.00	1.00	4.00	▁▇▂▁▁
Q2_6	0	1.00	1.81	1.01	-1.00	1.00	1.00	2.00	4.00	▁▇▃▂▂
Q2_7	0	1.00	1.49	0.81	-1.00	1.00	1.00	2.00	4.00	▁▇▃▁▁
Q2_8	0	1.00	1.46	0.67	-1.00	1.00	1.00	2.00	4.00	▁▇▅▁▁
Q2_9	0	1.00	2.09	1.25	-1.00	1.00	2.00	3.00	4.00	▁▇▃▂▃
Q2_10	0	1.00	2.02	0.95	-1.00	1.00	2.00	3.00	4.00	▁▆▇▃▂
Q3_1	0	1.00	1.90	1.07	-1.00	1.00	2.00	3.00	4.00	▁▇▅▂▂
Q3_2	0	1.00	2.38	1.19	-1.00	1.00	2.00	3.00	4.00	▁▇▆▆▇
Q3_3	0	1.00	2.63	1.12	-1.00	2.00	3.00	4.00	4.00	▁▅▇▇▇
Q3_4	0	1.00	1.86	0.99	-1.00	1.00	2.00	2.00	4.00	▁▇▆▃▂
Q3_5	0	1.00	2.02	0.87	-1.00	1.00	2.00	3.00	4.00	▁▅▇▃▁
Q3_6	0	1.00	2.07	0.97	-1.00	1.00	2.00	3.00	4.00	▁▆▇▃▂
Q4_1	0	1.00	1.80	0.92	-1.00	1.00	2.00	2.00	4.00	▁▇▆▂▁
Q4_2	0	1.00	1.83	0.89	-1.00	1.00	2.00	2.00	4.00	▁▇▇▃▁
Q4_3	0	1.00	2.00	0.93	-1.00	1.00	2.00	3.00	4.00	▁▆▇▃▂
Q4_4	0	1.00	2.43	1.03	-1.00	2.00	2.00	3.00	4.00	▁▅▇▆▅
Q4_5	0	1.00	2.34	1.04	-1.00	2.00	2.00	3.00	4.00	▁▅▇▅▃
Q4_6	0	1.00	2.03	0.97	-1.00	1.00	2.00	3.00	4.00	▁▇▇▅▂
Q5	0	1.00	1.16	0.41	-1.00	1.00	1.00	1.00	2.00	▁▁▁▇▂
Q6	0	1.00	2.85	0.87	-1.00	2.00	3.00	3.00	4.00	▁▁▅▇▃
Q7	0	1.00	1.17	0.45	-1.00	1.00	1.00	1.00	2.00	▁▁▁▇▂
Q8_1	0	1.00	2.75	1.17	-1.00	2.00	3.00	4.00	4.00	▁▃▆▅▇
Q8_2	0	1.00	2.75	0.88	-1.00	2.00	3.00	3.00	4.00	▁▁▇▇▅
Q8_3	0	1.00	2.12	0.84	-1.00	2.00	2.00	3.00	4.00	▁▃▇▃▁
Q8_4	0	1.00	2.00	0.90	-1.00	1.00	2.00	2.00	4.00	▁▅▇▃▁
Q8_5	0	1.00	2.46	0.89	-1.00	2.00	2.00	3.00	4.00	▁▂▇▆▂
Q8_6	0	1.00	2.23	0.89	-1.00	2.00	2.00	3.00	4.00	▁▃▇▃▂
Q8_7	0	1.00	2.79	1.00	-1.00	2.00	3.00	4.00	4.00	▁▂▇▆▇
Q8_8	0	1.00	1.99	0.91	-1.00	1.00	2.00	2.00	4.00	▁▆▇▃▂
Q8_9	0	1.00	1.81	0.82	-1.00	1.00	2.00	2.00	4.00	▁▆▇▂▁
Q9_1	0	1.00	1.52	0.77	-1.00	1.00	1.00	2.00	4.00	▁▇▅▁▁
Q9_2	0	1.00	2.54	1.04	-1.00	2.00	2.00	3.00	4.00	▁▃▇▆▅
Q9_3	0	1.00	3.19	1.03	-1.00	3.00	4.00	4.00	4.00	▁▁▂▅▇
Q9_4	0	1.00	3.44	0.94	-1.00	3.00	4.00	4.00	4.00	▁▁▁▂▇
Q10_1	0	1.00	1.91	0.32	-1.00	2.00	2.00	2.00	2.00	▁▁▁▁▇
Q10_2	0	1.00	1.78	0.44	-1.00	2.00	2.00	2.00	2.00	▁▁▁▂▇
Q10_3	0	1.00	1.82	0.42	-1.00	2.00	2.00	2.00	2.00	▁▁▁▂▇
Q10_4	0	1.00	1.98	0.18	-1.00	2.00	2.00	2.00	2.00	▁▁▁▁▇
Q11_1	0	1.00	1.86	0.38	-1.00	2.00	2.00	2.00	2.00	▁▁▁▁▇
Q11_2	0	1.00	1.97	0.24	-1.00	2.00	2.00	2.00	2.00	▁▁▁▁▇
Q11_3	0	1.00	1.60	0.51	-1.00	1.00	2.00	2.00	2.00	▁▁▁▅▇
Q11_4	0	1.00	1.86	0.37	-1.00	2.00	2.00	2.00	2.00	▁▁▁▁▇
Q11_5	0	1.00	1.77	0.44	-1.00	2.00	2.00	2.00	2.00	▁▁▁▂▇
Q11_6	0	1.00	1.97	0.23	-1.00	2.00	2.00	2.00	2.00	▁▁▁▁▇
Q14	0	1.00	2.65	1.74	-1.00	1.00	2.00	5.00	5.00	▁▇▅▃▇
Q15	0	1.00	2.21	1.52	-1.00	1.00	2.00	3.00	5.00	▁▇▅▂▅
Q16	0	1.00	1.62	0.84	-1.00	1.00	1.00	2.00	4.00	▁▇▅▂▁
Q17_1	0	1.00	1.64	0.82	-1.00	1.00	1.00	2.00	4.00	▁▇▆▁▁
Q17_2	0	1.00	1.68	0.88	-1.00	1.00	2.00	2.00	4.00	▁▇▆▂▁
Q17_3	0	1.00	2.32	1.13	-1.00	1.00	2.00	3.00	4.00	▁▇▇▅▅
Q17_4	0	1.00	2.71	1.07	-1.00	2.00	3.00	4.00	4.00	▁▃▇▇▇
Q18_1	0	1.00	1.95	0.32	-1.00	2.00	2.00	2.00	2.00	▁▁▁▁▇
Q18_2	0	1.00	1.92	0.36	-1.00	2.00	2.00	2.00	2.00	▁▁▁▁▇
Q18_3	0	1.00	1.92	0.35	-1.00	2.00	2.00	2.00	2.00	▁▁▁▁▇
Q18_4	0	1.00	1.93	0.35	-1.00	2.00	2.00	2.00	2.00	▁▁▁▁▇
Q18_5	0	1.00	1.96	0.31	-1.00	2.00	2.00	2.00	2.00	▁▁▁▁▇
Q18_6	0	1.00	1.94	0.33	-1.00	2.00	2.00	2.00	2.00	▁▁▁▁▇
Q18_7	0	1.00	1.88	0.41	-1.00	2.00	2.00	2.00	2.00	▁▁▁▁▇
Q18_8	0	1.00	1.78	0.47	-1.00	2.00	2.00	2.00	2.00	▁▁▁▂▇
Q18_9	0	1.00	1.93	0.34	-1.00	2.00	2.00	2.00	2.00	▁▁▁▁▇
Q18_10	0	1.00	1.92	0.36	-1.00	2.00	2.00	2.00	2.00	▁▁▁▁▇
Q19_1	0	1.00	-0.42	0.91	-1.00	-1.00	-1.00	1.00	1.00	▇▁▁▁▃
Q19_2	0	1.00	0.03	1.00	-1.00	-1.00	1.00	1.00	1.00	▇▁▁▁▇
Q19_3	0	1.00	0.05	1.00	-1.00	-1.00	1.00	1.00	1.00	▇▁▁▁▇
Q19_4	0	1.00	-0.23	0.97	-1.00	-1.00	-1.00	1.00	1.00	▇▁▁▁▅
Q19_5	0	1.00	-0.20	0.98	-1.00	-1.00	-1.00	1.00	1.00	▇▁▁▁▆
Q19_6	0	1.00	0.07	1.00	-1.00	-1.00	1.00	1.00	1.00	▇▁▁▁▇
Q19_7	0	1.00	-0.25	0.97	-1.00	-1.00	-1.00	1.00	1.00	▇▁▁▁▅
Q19_8	0	1.00	-0.43	0.90	-1.00	-1.00	-1.00	1.00	1.00	▇▁▁▁▃
Q19_9	0	1.00	-0.51	0.86	-1.00	-1.00	-1.00	-1.00	1.00	▇▁▁▁▂
Q19_10	0	1.00	-0.89	0.45	-1.00	-1.00	-1.00	-1.00	1.00	▇▁▁▁▁
Q20	0	1.00	1.08	0.30	-1.00	1.00	1.00	1.00	2.00	▁▁▁▇▁
Q21	0	1.00	1.21	0.57	-1.00	1.00	1.00	1.00	3.00	▁▁▇▁▁
Q22	5350	0.08	4.21	2.11	-1.00	2.00	4.00	6.00	7.00	▁▆▁▆▇
Q23	0	1.00	1.80	0.80	-1.00	1.00	2.00	2.00	3.00	▁▁▆▇▃
Q24	0	1.00	1.97	0.98	-1.00	1.00	2.00	3.00	4.00	▁▇▅▇▁
Q25	0	1.00	1.84	0.93	-1.00	1.00	2.00	2.00	4.00	▁▇▇▃▁
Q26	0	1.00	1.45	0.96	-1.00	1.00	1.00	1.00	4.00	▁▇▁▁▁
Q27_1	0	1.00	1.25	0.51	-1.00	1.00	1.00	2.00	2.00	▁▁▁▇▃
Q27_2	0	1.00	1.15	0.44	-1.00	1.00	1.00	1.00	2.00	▁▁▁▇▂
Q27_3	0	1.00	1.31	0.55	-1.00	1.00	1.00	2.00	2.00	▁▁▁▇▅
Q27_4	0	1.00	1.21	0.47	-1.00	1.00	1.00	1.00	2.00	▁▁▁▇▂
Q27_5	0	1.00	1.36	0.56	-1.00	1.00	1.00	2.00	2.00	▁▁▁▇▅
Q27_6	0	1.00	1.27	0.50	-1.00	1.00	1.00	2.00	2.00	▁▁▁▇▃
Q28_1	534	0.91	0.53	0.85	-1.00	1.00	1.00	1.00	1.00	▂▁▁▁▇
Q28_2	534	0.91	-0.27	0.96	-1.00	-1.00	-1.00	1.00	1.00	▇▁▁▁▅
Q28_3	534	0.91	-0.27	0.96	-1.00	-1.00	-1.00	1.00	1.00	▇▁▁▁▅
Q28_4	534	0.91	-0.30	0.95	-1.00	-1.00	-1.00	1.00	1.00	▇▁▁▁▅
Q28_5	534	0.91	-0.28	0.96	-1.00	-1.00	-1.00	1.00	1.00	▇▁▁▁▅
Q28_6	534	0.91	-0.43	0.90	-1.00	-1.00	-1.00	1.00	1.00	▇▁▁▁▃
Q28_7	534	0.91	-0.26	0.97	-1.00	-1.00	-1.00	1.00	1.00	▇▁▁▁▅
Q28_8	534	0.91	-0.95	0.30	-1.00	-1.00	-1.00	-1.00	1.00	▇▁▁▁▁
Q29_1	4494	0.23	-0.46	0.89	-1.00	-1.00	-1.00	1.00	1.00	▇▁▁▁▃
Q29_2	4494	0.23	-0.76	0.65	-1.00	-1.00	-1.00	-1.00	1.00	▇▁▁▁▁
Q29_3	4494	0.23	-0.40	0.92	-1.00	-1.00	-1.00	1.00	1.00	▇▁▁▁▃
Q29_4	4494	0.23	-0.60	0.80	-1.00	-1.00	-1.00	-1.00	1.00	▇▁▁▁▂
Q29_5	4494	0.23	-0.66	0.75	-1.00	-1.00	-1.00	-1.00	1.00	▇▁▁▁▂
Q29_6	4494	0.23	-0.93	0.38	-1.00	-1.00	-1.00	-1.00	1.00	▇▁▁▁▁
Q29_7	4494	0.23	-0.76	0.65	-1.00	-1.00	-1.00	-1.00	1.00	▇▁▁▁▁
Q29_8	4494	0.23	-0.70	0.72	-1.00	-1.00	-1.00	-1.00	1.00	▇▁▁▁▂
Q29_9	4494	0.23	-0.81	0.59	-1.00	-1.00	-1.00	-1.00	1.00	▇▁▁▁▁
Q29_10	4494	0.23	-0.70	0.71	-1.00	-1.00	-1.00	-1.00	1.00	▇▁▁▁▂
Q30	0	1.00	2.33	1.26	-1.00	1.00	2.00	3.00	5.00	▁▆▇▆▃
Q31	4244	0.27	1.36	0.52	-1.00	1.00	1.00	2.00	2.00	▁▁▁▇▅
Q32	3834	0.34	1.37	0.50	-1.00	1.00	1.00	2.00	2.00	▁▁▁▇▅
Q33	3594	0.38	1.22	0.96	-1.00	1.00	1.00	2.00	2.00	▂▁▁▇▇
ppage	0	1.00	51.69	17.07	22.00	36.00	54.00	65.00	94.00	▆▅▇▆▁

Data 2

Introduction and data

Identify the source of the data.
- The CDC “Center for Disease and Control”
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
- It was collected from December 14th, 2020 till March 7th, 2023, and the data was collected from the CDC workers who administered the vaccine doses to civilians.
Write a brief description of the observations.
- From the data observations, there was a rise in the number of daily vaccine doses from the months of December through April when quarantine had been at its highest. But then it significantly dropped from May to July because it was almost half the doses people were getting. But then it started to stabilize after July but never reached its peak again.

Research question

A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
- How has the number of administered vaccinations (dose 1, dose 2, booster) changed over time?
- How has the daily number of administered vaccinations (dose 1, dose 2, booster) changed over time by state?
A description of the research topic along with a concise statement of your hypotheses on this topic.
- Research with this dataset would delve into trends regarding the number of administered COVID-19 vaccinations across the United States. Another potential avenue of research is to join this dataset with another relevant one provided by the CDC in order to investigate different variables related to vaccination trends.
- We hypothesize that the number of administered dose 1 COVID-19 vaccinations will display a decreasing trend over time (especially as the United States lifted restrictions) because more pro-vaccine individuals would have already gotten their first dose earlier.
Identify the types of variables in your research question. Categorical? Quantitative?
- Categorical: location, date, date type
- Quantitative: The number of administered vaccinations (cumulative and daily, by location)

Glimpse of data

# add code here
cdc_data <- read.csv("data/cdc-covid-19-vaccinations.csv")
skimr::skim(cdc_data)

Data summary
Name	cdc_data
Number of rows	84240
Number of columns	29
_______________________
Column type frequency:
character	3
numeric	26
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
Date	1	10	10	816
date_type	1	5	6	2
Location	1	2	2	60

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
MMWR_week	0	1.00	24.68	15.80	1	10.0	23.0	38.00	53.0	▇▆▆▅▆
Administered_Daily	0	1.00	31836.78	170917.12	-1593072	893.0	5071.0	17215.25	5610894.0	▁▇▁▁▁
Administered_Cumulative	0	1.00	14021962.56	60092860.31	0	930069.8	3465873.5	9412128.75	672537312.0	▇▁▁▁▁
Administered_7_Day_Rolling_Average	2280	0.97	30765.17	156108.89	-138218	1530.0	5953.0	17195.25	3506170.0	▇▁▁▁▁
Admin_Dose_1_Daily	0	1.00	12774.09	134918.19	-2468411	175.0	1084.5	5339.00	30150525.0	▇▁▁▁▁
Admin_Dose_1_Cumulative	0	1.00	6620169.69	27594866.18	0	497187.5	1807394.5	4273760.00	269650596.0	▇▁▁▁▁
Admin_Dose_1_Day_Rolling_Average	2280	0.97	12874.30	88699.12	-326573	245.0	1280.5	5744.25	5011136.0	▇▁▁▁▁
Administered_Dose1_Pop_Pct	0	1.00	59.11	25.71	0	46.7	64.4	77.00	100.0	▂▂▅▇▅
Administered_daily_change_report	16740	0.80	17742.62	129779.67	0	0.0	1.0	6758.00	4575683.0	▇▁▁▁▁
Administered_daily_change_report_7dayroll	17820	0.79	35692.98	171664.92	-138218	2048.0	7787.5	20553.00	3506170.0	▇▁▁▁▁
Series_Complete_Daily	0	1.00	10904.38	123334.70	-523379	101.0	761.0	4124.00	28709428.0	▇▁▁▁▁
Series_Complete_Cumulative	0	1.00	5561849.68	23535246.70	0	347879.0	1467425.0	3790923.75	230142115.0	▇▁▁▁▁
Series_Complete_Day_Rolling_Average	2280	0.97	11005.08	80152.04	-71931	160.0	906.0	4492.25	4837589.0	▇▁▁▁▁
Series_Complete_Pop_Pct	0	1.00	49.73	24.22	0	38.0	55.7	66.50	90.6	▃▂▅▇▃
Booster_Daily	0	1.00	5577.35	47225.59	-751692	0.0	60.5	1281.00	2748886.0	▁▇▁▁▁
Booster_Cumulative	0	1.00	1730063.67	9382469.98	0	0.0	71530.5	1005061.75	117621762.0	▇▁▁▁▁
Booster_7_Day_Rolling_Average	2280	0.97	5412.89	42074.55	-2097	0.0	107.0	1382.00	1257266.0	▇▁▁▁▁
Additional_Doses_Vax_Pct	0	1.00	23.49	23.17	0	0.0	20.0	46.20	67.5	▇▁▂▅▂
Second_Booster_50Plus_Daily	0	1.00	1716.21	44865.68	-40176	0.0	0.0	124.00	11835302.0	▇▁▁▁▁
Second_Booster_50Plus_Cumulative	0	1.00	233289.99	1814678.63	0	0.0	0.0	36473.50	36171359.0	▇▁▁▁▁
Second_Booster_50Plus_7_Day_Rolling_Average	2280	0.97	1264.51	18881.48	-170	0.0	0.0	131.00	1855467.0	▇▁▁▁▁
Second_Booster_50Plus_Vax_Pct	0	1.00	10.05	17.47	0	0.0	0.0	16.10	67.2	▇▁▁▁▁
Bivalent_Booster_Daily	0	1.00	2560.51	67066.01	-1	0.0	0.0	0.00	14759770.0	▇▁▁▁▁
Bivalent_Booster_Cumulative	0	1.00	182246.09	2057578.02	0	0.0	0.0	0.00	53980763.0	▇▁▁▁▁
Bivalent_Booster_7_Day_Rolling_Average	2280	0.97	1314.40	17268.64	0	0.0	0.0	0.00	615222.0	▇▁▁▁▁
Bivalent_Booster_Pop_Pct	0	1.00	1.65	4.87	0	0.0	0.0	0.00	32.8	▇▁▁▁▁

Data 3

Introduction and data

Identify the source of the data.
- The source of the data will come from www.metacritic.com.
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
- We scraped the website to collect the data.
Write a brief description of the observations.
- The observations are video games that have been ranked on metacritic. Observations include the video game title, developer, release date, scores, etc.

Research question

A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
- How are games rated based on who developed them?
- Have their been changes in the scores of different games over the years?
- How does the platform the game is on affect the scores of the games?
- Do the top games of all time (games with high critic scores) also have high user scores?
- What variables have influenced the utilization of the Reddit API over time, and how has it changed?
- What variables have influenced the utilization of the Reddit API over time, and how has it changed?
A description of the research topic along with a concise statement of your hypotheses on this topic.
- This data was web scraped from metacritic. Using the video game’s information (title, developer, platform, date, critic scores, user scores, etc.) we can find relationships between these variables to formulate data visualization that show interesting patterns.

Glimpse of data

paths_allowed("https://www.metacritic.com/browse/games/score/metascore/all/all/filtered?sort=desc")
scrape_title <- function(url){
  #pause for a couple of seconds 
  Sys.sleep(2)
  # read the first page 
  page <- read_html(url)
  # extract desired components  
  title <- html_elements(x = page, css = ".title h3") |> 
    html_text2() |> 
    tolower() |> 
    str_replace_all(" ", "-") |> 
    str_replace_all("[':]", "") |>
    str_replace_all("[.]", "") |>
    str_replace_all("-&", "") |>
    str_replace_all("[(]", "") |>
    str_replace_all("[)]", "") |>
    str_replace_all("-/", "")
  
  platform <- html_elements(
    x = page, 
    css = ".platform .data") |> 
    html_text2() |> 
    tolower() |>
    str_replace_all(" ", "-") 
  
  user_score <- html_elements(
    x = page, 
    css = ".user") |> 
    html_text2()
  
  metascore <- html_elements(
    x = page, 
    css = ".clamp-metascore .positive") |> 
    html_text2()
  
  # create a tibble with this data 
  game_raw <- tibble(
    title = title, 
    platform = platform, 
    user_score = user_score, 
    metascore = metascore, )
}
page_nums <- 0:5
title_urls <- str_glue("https://www.metacritic.com/browse/games/score/metascore/all/all/filtered?sort=desc&page={page_nums}")
meta_titles <- map(.x = title_urls, .f = scrape_title) |> list_rbind()
write_csv(x = meta_titles, file = "data/meta-titles.csv")

scrape_game <- function(url){
#pause for a couple of seconds 
  Sys.sleep(2)
  
# read the page 
  page <- read_html(url)
  
# extract desired components 
  titles <- html_elements(x = page, css = "h1") |> html_text2()
  
  platform <- html_elements(
    x = page,
    css = ".platform"
  ) |>
    html_text2()
developers <- html_elements(
  x = page, 
  css = ".developer") |> 
  html_text2()
release_dates <- html_elements(
  x = page, 
  css = ".release_data .data") |> 
  html_text2()
genres <- html_elements(
  x = page, 
  css = ".product_genre") |> 
  html_text2()
num_players <- html_elements(
  x = page, 
  css = ".product_players .data") |> 
  html_text2() 
age_rating <- html_elements(
  x = page,
  css = ".product_rating .data"
) |>
  html_text2()
num_critic_reviews <- html_elements(
  x = page,
  css = ".count a span"
) |>
  html_text2()
num_user_reviews <- html_elements(
  x = page,
  css = ".feature_userscore .count a"
) |>
  html_text2()
if(length(titles) == 0) { titles = NA }
if(length(platform) == 0) { platform = NA }
if(length(release_dates) == 0) { release_dates = NA }
if(length(genres) == 0) { genres = NA }
if(length(num_players) == 0) { num_players = NA }
if(length(age_rating) == 0) { age_rating = NA }
if(length(num_critic_reviews) == 0) { num_critic_reviews = NA }
if(length(num_user_reviews) == 0) { num_user_reviews = NA }

# create a tibble with this data 

game_raw <- tibble(
  title = titles, 
  platform = platform,
  developer = developers, 
  date = release_dates, 
  genre = genres,
  number_of_players = num_players,
  rating = age_rating,
  critic_reviews = num_critic_reviews,
  user_reviews = num_user_reviews,
  )

# clean up the data 
#game_raw |> mutate(date = mdy(date)) 
}

url_titles <- meta_titles$title[1:600]
url_platforms <- meta_titles$platform[1:600]
meta_urls <- str_glue("https://www.metacritic.com/game/{url_platforms}/{url_titles}") 
meta_games <- map(.x = meta_urls, .f = scrape_game) |> list_rbind()


write_csv(x = meta_games, file = "data/meta.csv")

games <- read_csv("data/meta.csv")

Rows: 600 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): title, platform, developer, date, genre, number_of_players, rating,...
dbl (1): critic_reviews

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

games_scores <- read_csv("data/meta-titles.csv")

Rows: 600 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): title, platform
dbl (2): user_score, metascore

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

skimr::skim(games)

Data summary
Name	games
Number of rows	600
Number of columns	9
_______________________
Column type frequency:
character	8
numeric	1
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
title	0	1.00	3	68	435
platform	0	1.00	2	16	21
developer	0	1.00	15	65	220
date	0	1.00	11	12	473
genre	0	1.00	25	127	198
number_of_players	72	0.88	1	21	31
rating	27	0.96	1	4	6
user_reviews	0	1.00	9	14	495

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
critic_reviews	0	1	39.88	27.24	7	18	31	59	141	▇▃▂▁▁

skimr::skim(games_scores)

Data summary
Name	games_scores
Number of rows	600
Number of columns	4
_______________________
Column type frequency:
character	2
numeric	2
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
title	0	1	3	66	0	434	0
platform	0	1	2	16	0	21	0

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
user_score	0	1	8.22	0.84	3.3	7.9	8.4	8.8	9.7	▁▁▁▇▇
metascore	0	1	91.93	2.05	89.0	90.0	91.0	93.0	99.0	▇▅▂▁▁