Impact of Education Level on Change in Mental Health during COVID

Proposal

Data 1 - Mental Health Post-COVID

Introduction and data

This data comes from Data.gov.
It was originally created 2 years ago by the U.S. Census Bureau and 5 other federal agencies via the Household Pulse Survey.
The observations are 12 day periods with an indicator of mental health separated by various groups, whether it be age, sex, region, race, and more.

Research question

Is post-covid mental health affected by regions (states)?
Covid-19 impacted on people’s mental wellness, but the impact could vary from region to region in US. We hypothesize that regions with good weather such as California were least impacted by Covid-19 in terms of mental health.
The regions (states) are categorical, while the mental health index (score) is quantitative.

Glimpse of data

Rows: 10404 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): Indicator, Group, State, Subgroup, Phase, Time Period Label, Time ...
dbl  (5): Time Period, Value, LowCI, HighCI, Suppression Flag

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data summary
Name	mental_health
Number of rows	10404
Number of columns	15
_______________________
Column type frequency:
character	10
numeric	5
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
Indicator	0	1.00	44	98	4
Group	0	1.00	6	45	10
State	0	1.00	4	20	52
Subgroup	0	1.00	4	69	80
Phase	0	1.00	1	19	8
Time Period Label	0	1.00	20	27	38
Time Period Start Date	0	1.00	10	10	38
Time Period End Date	0	1.00	10	10	38
Confidence Interval	490	0.95	9	11	7709
Quartile Range	3672	0.65	5	22	500

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Time Period	0	1.00	28.13	11.04	1.0	20.0	29.0	37.0	45.0	▁▅▇▇▇
Value	490	0.95	17.45	8.27	1.4	10.3	16.2	24.0	62.9	▇▇▃▁▁
LowCI	490	0.95	14.77	7.66	0.8	8.0	13.9	20.8	53.2	▇▆▃▁▁
HighCI	490	0.95	20.48	9.05	2.0	12.9	19.2	27.4	71.9	▇▇▃▁▁
Suppression Flag	10382	0.00	1.00	0.00	1.0	1.0	1.0	1.0	1.0	▁▁▇▁▁

Data 2 - Music Popularity

Introduction and data

This dataset is sourced from the Million Song Dataset (a library created by data-collection companies, Echo Nest and LabROSA).
This data was collected by Echo Nest, which hopes to collect statistics on the top one million popular, contemporary songs. It was also collected by LabROSA, which studied machine learning in music.

These companies were funded by the National Science Foundation of America to create a dataset to evaluate the composition of commercially successful tunes.
Observations include the song name, artist, song length, and year released of the top popular, contemporary songs. There are also some composition and mixing information, like the length of musical bars and the song’s fading introduction.

Research question

How does popularity of contemporary songs relate to their artists? How does initial popularity of a song relate to popularity growth of a tune?
A description of the research topic along with a concise statement of your hypotheses on this topic.

Music has become am important part of people’s life, as it alleviates stress and provides an outlet from daily stressors. Internet platforms have also increased the music accessibility to listeners across the globe. We hypothesize that artists with higher familiarity are positively correlated with the popularity of their songs.
The popularity of songs and artist familiarity/hotness are all quantitative variables. The terms of artists is categorical, which can include latin jazz, heavy metal, etc.

Glimpse of data

Rows: 10000 Columns: 35
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): artist.id, artist.name, artist.terms, song.id
dbl (31): artist.familiarity, artist.hotttnesss, artist.latitude, artist.loc...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data summary
Name	music
Number of rows	10000
Number of columns	35
_______________________
Column type frequency:
character	4
numeric	31
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
artist.id	0	1	18	18	3888
artist.name	0	1	1	255	4412
artist.terms	5	1	2	40	458
song.id	0	1	18	51	10000

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
artist.familiarity	1	0.57	0.16	0.00	0.47	0.56	0.67	1.00	▁▂▇▅▂
artist.hotttnesss	1	0.39	0.14	0.00	0.33	0.38	0.45	1.08	▁▇▃▁▁
artist.latitude	1	13.90	20.36	-41.28	0.00	0.00	34.42	69.65	▁▇▁▃▁
artist.location	1	0.08	7.80	0.00	0.00	0.00	0.00	780.00	▇▁▁▁▁
artist.longitude	1	-23.92	43.72	-162.44	-73.95	0.00	0.00	174.77	▁▂▇▁▁
artist.similar	1	0.00	0.00	0.00	0.00	0.00	0.00	0.00	▁▁▇▁▁
artist.terms_freq	1	224.89	22392.16	0.00	0.95	1.00	1.00	2239217.00	▇▁▁▁▁
release.id	1	371024.06	236777.83	0.00	172858.00	333103.00	573532.50	823599.00	▇▇▅▆▅
release.name	1	23.10	1322.90	0.00	0.00	0.00	0.00	85555.00	▇▁▁▁▁
song.artist_mbtags	1	0.00	0.00	0.00	0.00	0.00	0.00	0.33	▇▁▁▁▁
song.artist_mbtags_count	1	0.52	0.88	0.00	0.00	0.00	1.00	9.00	▇▁▁▁▁
song.bars_confidence	1	0.24	0.29	0.00	0.04	0.12	0.35	8.86	▇▁▁▁▁
song.bars_start	1	1.07	1.72	0.00	0.44	0.79	1.22	59.74	▇▁▁▁▁
song.beats_confidence	1	0.61	0.32	0.00	0.41	0.69	0.88	1.00	▃▂▃▆▇
song.beats_start	1	0.43	0.81	-60.00	0.19	0.33	0.50	12.25	▁▁▁▁▇
song.duration	1	240.62	246.08	1.04	176.03	223.06	276.38	22050.00	▇▁▁▁▁
song.end_of_fade_in	1	0.76	1.86	0.00	0.00	0.20	0.42	43.12	▇▁▁▁▁
song.hotttnesss	1	-0.24	0.69	-1.00	-1.00	0.00	0.41	1.00	▇▁▃▆▂
song.key	1	5.37	9.67	0.00	2.00	5.00	8.00	904.80	▇▁▁▁▁
song.key_confidence	1	0.45	0.33	0.00	0.22	0.47	0.66	19.08	▇▁▁▁▁
song.loudness	1	-10.48	5.40	-51.64	-13.16	-9.38	-6.53	0.57	▁▁▁▆▇
song.mode	1	0.69	0.46	0.00	0.00	1.00	1.00	1.00	▃▁▁▁▇
song.mode_confidence	1	0.48	0.19	0.00	0.36	0.49	0.61	1.00	▂▅▇▅▁
song.start_of_fade_out	1	229.88	112.02	-21.39	168.86	213.86	266.27	1813.43	▇▁▁▁▁
song.tatums_confidence	1	0.51	0.33	0.00	0.24	0.50	0.77	9.23	▇▁▁▁▁
song.tatums_start	1	0.30	0.51	0.00	0.11	0.19	0.29	12.25	▇▁▁▁▁
song.tempo	1	122.90	35.20	0.00	96.96	120.16	144.01	262.83	▁▆▇▂▁
song.time_signature	1	3.56	1.27	0.00	3.00	4.00	4.00	7.00	▂▁▇▁▁
song.time_signature_confidence	1	0.60	8.99	0.00	0.10	0.55	0.86	898.89	▇▁▁▁▁
song.title	1	10.01	945.49	0.00	0.00	0.00	0.00	94496.00	▇▁▁▁▁
song.year	1	934.70	996.65	0.00	0.00	0.00	2000.00	2010.00	▇▁▁▁▇

Data 3 - Coffee Quality

Introduction and data

CORGIS Dataset Project by Sam Donald.
This data is from Coffee Quality Institute’s review pages in January 2018 by Buzzfeed’s Data Scientist James LeDoux.
This data is for both Arabica and Robusta beans, across many countries and professionally rated on a 0-100 scale. All sorts of scoring/ratings for things like acidity, sweetness, fragrance, balance, etc.

Research question

Which region (continent) produces the best coffee?
Coffee was introduced to almost all tropical regions around the world. However, which region can produce the coffee with the best quality? We hypothesize that South America countries can produce the best coffee.
Regions are categorical, while coffee quality (score) is quantitative.

Glimpse of data

Data summary
Name	coffee
Number of rows	989
Number of columns	23
_______________________
Column type frequency:
character	7
numeric	16
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
Location.Country	1	4	28	32
Location.Region	1	3	76	278
Data.Owner	1	3	50	263
Data.Type.Species	1	7	7	2
Data.Type.Variety	1	3	21	28
Data.Type.Processing.method	1	3	25	6
Data.Color	1	4	12	5

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Location.Altitude.Min	1	1640.08	9192.52	0	905.00	1300.00	1550.00	190164.00	▇▁▁▁▁
Location.Altitude.Max	1	1675.93	9191.96	0	950.00	1310.00	1600.00	190164.00	▇▁▁▁▁
Location.Altitude.Average	1	1658.00	9192.06	0	950.00	1300.00	1600.00	190164.00	▇▁▁▁▁
Year	1	2013.55	1.66	2010	2012.00	2013.00	2015.00	2018.00	▁▇▃▃▁
Data.Production.Number.of.bags	1	151.76	125.67	1	15.00	170.00	275.00	600.00	▇▁▇▁▁
Data.Production.Bag.weight	1	210.49	1666.71	0	1.00	60.00	69.00	19200.00	▇▁▁▁▁
Data.Scores.Aroma	1	7.57	0.40	0	7.42	7.58	7.75	8.75	▁▁▁▁▇
Data.Scores.Flavor	1	7.52	0.42	0	7.33	7.50	7.75	8.83	▁▁▁▁▇
Data.Scores.Aftertaste	1	7.39	0.43	0	7.25	7.42	7.58	8.67	▁▁▁▁▇
Data.Scores.Acidity	1	7.54	0.40	0	7.33	7.58	7.75	8.75	▁▁▁▁▇
Data.Scores.Body	1	7.51	0.39	0	7.33	7.50	7.67	8.50	▁▁▁▁▇
Data.Scores.Balance	1	7.50	0.43	0	7.33	7.50	7.75	8.58	▁▁▁▁▇
Data.Scores.Uniformity	1	9.82	0.59	0	10.00	10.00	10.00	10.00	▁▁▁▁▇
Data.Scores.Sweetness	1	9.83	0.69	0	10.00	10.00	10.00	10.00	▁▁▁▁▇
Data.Scores.Moisture	1	0.09	0.04	0	0.10	0.11	0.12	0.28	▃▇▆▁▁
Data.Scores.Total	1	81.97	3.86	0	81.08	82.50	83.58	90.58	▁▁▁▁▇