Impact of Education Level on Change in Mental Health during COVID
Proposal
Data 1 - Mental Health Post-COVID
Introduction and data
This data comes from Data.gov.
It was originally created 2 years ago by the U.S. Census Bureau and 5 other federal agencies via the Household Pulse Survey.
The observations are 12 day periods with an indicator of mental health separated by various groups, whether it be age, sex, region, race, and more.
Research question
- Is post-covid mental health affected by regions (states)?
- Covid-19 impacted on people’s mental wellness, but the impact could vary from region to region in US. We hypothesize that regions with good weather such as California were least impacted by Covid-19 in terms of mental health.
- The regions (states) are categorical, while the mental health index (score) is quantitative.
Glimpse of data
Rows: 10404 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): Indicator, Group, State, Subgroup, Phase, Time Period Label, Time ...
dbl (5): Time Period, Value, LowCI, HighCI, Suppression Flag
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Name | mental_health |
Number of rows | 10404 |
Number of columns | 15 |
_______________________ | |
Column type frequency: | |
character | 10 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Indicator | 0 | 1.00 | 44 | 98 | 0 | 4 | 0 |
Group | 0 | 1.00 | 6 | 45 | 0 | 10 | 0 |
State | 0 | 1.00 | 4 | 20 | 0 | 52 | 0 |
Subgroup | 0 | 1.00 | 4 | 69 | 0 | 80 | 0 |
Phase | 0 | 1.00 | 1 | 19 | 0 | 8 | 0 |
Time Period Label | 0 | 1.00 | 20 | 27 | 0 | 38 | 0 |
Time Period Start Date | 0 | 1.00 | 10 | 10 | 0 | 38 | 0 |
Time Period End Date | 0 | 1.00 | 10 | 10 | 0 | 38 | 0 |
Confidence Interval | 490 | 0.95 | 9 | 11 | 0 | 7709 | 0 |
Quartile Range | 3672 | 0.65 | 5 | 22 | 0 | 500 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Time Period | 0 | 1.00 | 28.13 | 11.04 | 1.0 | 20.0 | 29.0 | 37.0 | 45.0 | ▁▅▇▇▇ |
Value | 490 | 0.95 | 17.45 | 8.27 | 1.4 | 10.3 | 16.2 | 24.0 | 62.9 | ▇▇▃▁▁ |
LowCI | 490 | 0.95 | 14.77 | 7.66 | 0.8 | 8.0 | 13.9 | 20.8 | 53.2 | ▇▆▃▁▁ |
HighCI | 490 | 0.95 | 20.48 | 9.05 | 2.0 | 12.9 | 19.2 | 27.4 | 71.9 | ▇▇▃▁▁ |
Suppression Flag | 10382 | 0.00 | 1.00 | 0.00 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | ▁▁▇▁▁ |
Data 2 - Music Popularity
Introduction and data
This dataset is sourced from the Million Song Dataset (a library created by data-collection companies, Echo Nest and LabROSA).
This data was collected by Echo Nest, which hopes to collect statistics on the top one million popular, contemporary songs. It was also collected by LabROSA, which studied machine learning in music.
These companies were funded by the National Science Foundation of America to create a dataset to evaluate the composition of commercially successful tunes.
Observations include the song name, artist, song length, and year released of the top popular, contemporary songs. There are also some composition and mixing information, like the length of musical bars and the song’s fading introduction.
Research question
How does popularity of contemporary songs relate to their artists? How does initial popularity of a song relate to popularity growth of a tune?
A description of the research topic along with a concise statement of your hypotheses on this topic.
Music has become am important part of people’s life, as it alleviates stress and provides an outlet from daily stressors. Internet platforms have also increased the music accessibility to listeners across the globe. We hypothesize that artists with higher familiarity are positively correlated with the popularity of their songs.
The popularity of songs and artist familiarity/hotness are all quantitative variables. The terms of artists is categorical, which can include latin jazz, heavy metal, etc.
Glimpse of data
Rows: 10000 Columns: 35
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): artist.id, artist.name, artist.terms, song.id
dbl (31): artist.familiarity, artist.hotttnesss, artist.latitude, artist.loc...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Name | music |
Number of rows | 10000 |
Number of columns | 35 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 31 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
artist.id | 0 | 1 | 18 | 18 | 0 | 3888 | 0 |
artist.name | 0 | 1 | 1 | 255 | 0 | 4412 | 0 |
artist.terms | 5 | 1 | 2 | 40 | 0 | 458 | 0 |
song.id | 0 | 1 | 18 | 51 | 0 | 10000 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
artist.familiarity | 0 | 1 | 0.57 | 0.16 | 0.00 | 0.47 | 0.56 | 0.67 | 1.00 | ▁▂▇▅▂ |
artist.hotttnesss | 0 | 1 | 0.39 | 0.14 | 0.00 | 0.33 | 0.38 | 0.45 | 1.08 | ▁▇▃▁▁ |
artist.latitude | 0 | 1 | 13.90 | 20.36 | -41.28 | 0.00 | 0.00 | 34.42 | 69.65 | ▁▇▁▃▁ |
artist.location | 0 | 1 | 0.08 | 7.80 | 0.00 | 0.00 | 0.00 | 0.00 | 780.00 | ▇▁▁▁▁ |
artist.longitude | 0 | 1 | -23.92 | 43.72 | -162.44 | -73.95 | 0.00 | 0.00 | 174.77 | ▁▂▇▁▁ |
artist.similar | 0 | 1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ▁▁▇▁▁ |
artist.terms_freq | 0 | 1 | 224.89 | 22392.16 | 0.00 | 0.95 | 1.00 | 1.00 | 2239217.00 | ▇▁▁▁▁ |
release.id | 0 | 1 | 371024.06 | 236777.83 | 0.00 | 172858.00 | 333103.00 | 573532.50 | 823599.00 | ▇▇▅▆▅ |
release.name | 0 | 1 | 23.10 | 1322.90 | 0.00 | 0.00 | 0.00 | 0.00 | 85555.00 | ▇▁▁▁▁ |
song.artist_mbtags | 0 | 1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.33 | ▇▁▁▁▁ |
song.artist_mbtags_count | 0 | 1 | 0.52 | 0.88 | 0.00 | 0.00 | 0.00 | 1.00 | 9.00 | ▇▁▁▁▁ |
song.bars_confidence | 0 | 1 | 0.24 | 0.29 | 0.00 | 0.04 | 0.12 | 0.35 | 8.86 | ▇▁▁▁▁ |
song.bars_start | 0 | 1 | 1.07 | 1.72 | 0.00 | 0.44 | 0.79 | 1.22 | 59.74 | ▇▁▁▁▁ |
song.beats_confidence | 0 | 1 | 0.61 | 0.32 | 0.00 | 0.41 | 0.69 | 0.88 | 1.00 | ▃▂▃▆▇ |
song.beats_start | 0 | 1 | 0.43 | 0.81 | -60.00 | 0.19 | 0.33 | 0.50 | 12.25 | ▁▁▁▁▇ |
song.duration | 0 | 1 | 240.62 | 246.08 | 1.04 | 176.03 | 223.06 | 276.38 | 22050.00 | ▇▁▁▁▁ |
song.end_of_fade_in | 0 | 1 | 0.76 | 1.86 | 0.00 | 0.00 | 0.20 | 0.42 | 43.12 | ▇▁▁▁▁ |
song.hotttnesss | 0 | 1 | -0.24 | 0.69 | -1.00 | -1.00 | 0.00 | 0.41 | 1.00 | ▇▁▃▆▂ |
song.key | 0 | 1 | 5.37 | 9.67 | 0.00 | 2.00 | 5.00 | 8.00 | 904.80 | ▇▁▁▁▁ |
song.key_confidence | 0 | 1 | 0.45 | 0.33 | 0.00 | 0.22 | 0.47 | 0.66 | 19.08 | ▇▁▁▁▁ |
song.loudness | 0 | 1 | -10.48 | 5.40 | -51.64 | -13.16 | -9.38 | -6.53 | 0.57 | ▁▁▁▆▇ |
song.mode | 0 | 1 | 0.69 | 0.46 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▃▁▁▁▇ |
song.mode_confidence | 0 | 1 | 0.48 | 0.19 | 0.00 | 0.36 | 0.49 | 0.61 | 1.00 | ▂▅▇▅▁ |
song.start_of_fade_out | 0 | 1 | 229.88 | 112.02 | -21.39 | 168.86 | 213.86 | 266.27 | 1813.43 | ▇▁▁▁▁ |
song.tatums_confidence | 0 | 1 | 0.51 | 0.33 | 0.00 | 0.24 | 0.50 | 0.77 | 9.23 | ▇▁▁▁▁ |
song.tatums_start | 0 | 1 | 0.30 | 0.51 | 0.00 | 0.11 | 0.19 | 0.29 | 12.25 | ▇▁▁▁▁ |
song.tempo | 0 | 1 | 122.90 | 35.20 | 0.00 | 96.96 | 120.16 | 144.01 | 262.83 | ▁▆▇▂▁ |
song.time_signature | 0 | 1 | 3.56 | 1.27 | 0.00 | 3.00 | 4.00 | 4.00 | 7.00 | ▂▁▇▁▁ |
song.time_signature_confidence | 0 | 1 | 0.60 | 8.99 | 0.00 | 0.10 | 0.55 | 0.86 | 898.89 | ▇▁▁▁▁ |
song.title | 0 | 1 | 10.01 | 945.49 | 0.00 | 0.00 | 0.00 | 0.00 | 94496.00 | ▇▁▁▁▁ |
song.year | 0 | 1 | 934.70 | 996.65 | 0.00 | 0.00 | 0.00 | 2000.00 | 2010.00 | ▇▁▁▁▇ |
Data 3 - Coffee Quality
Introduction and data
CORGIS Dataset Project by Sam Donald.
This data is from Coffee Quality Institute’s review pages in January 2018 by Buzzfeed’s Data Scientist James LeDoux.
This data is for both Arabica and Robusta beans, across many countries and professionally rated on a 0-100 scale. All sorts of scoring/ratings for things like acidity, sweetness, fragrance, balance, etc.
Research question
- Which region (continent) produces the best coffee?
- Coffee was introduced to almost all tropical regions around the world. However, which region can produce the coffee with the best quality? We hypothesize that South America countries can produce the best coffee.
- Regions are categorical, while coffee quality (score) is quantitative.
Glimpse of data
Name | coffee |
Number of rows | 989 |
Number of columns | 23 |
_______________________ | |
Column type frequency: | |
character | 7 |
numeric | 16 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Location.Country | 0 | 1 | 4 | 28 | 0 | 32 | 0 |
Location.Region | 0 | 1 | 3 | 76 | 0 | 278 | 0 |
Data.Owner | 0 | 1 | 3 | 50 | 0 | 263 | 0 |
Data.Type.Species | 0 | 1 | 7 | 7 | 0 | 2 | 0 |
Data.Type.Variety | 0 | 1 | 3 | 21 | 0 | 28 | 0 |
Data.Type.Processing.method | 0 | 1 | 3 | 25 | 0 | 6 | 0 |
Data.Color | 0 | 1 | 4 | 12 | 0 | 5 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Location.Altitude.Min | 0 | 1 | 1640.08 | 9192.52 | 0 | 905.00 | 1300.00 | 1550.00 | 190164.00 | ▇▁▁▁▁ |
Location.Altitude.Max | 0 | 1 | 1675.93 | 9191.96 | 0 | 950.00 | 1310.00 | 1600.00 | 190164.00 | ▇▁▁▁▁ |
Location.Altitude.Average | 0 | 1 | 1658.00 | 9192.06 | 0 | 950.00 | 1300.00 | 1600.00 | 190164.00 | ▇▁▁▁▁ |
Year | 0 | 1 | 2013.55 | 1.66 | 2010 | 2012.00 | 2013.00 | 2015.00 | 2018.00 | ▁▇▃▃▁ |
Data.Production.Number.of.bags | 0 | 1 | 151.76 | 125.67 | 1 | 15.00 | 170.00 | 275.00 | 600.00 | ▇▁▇▁▁ |
Data.Production.Bag.weight | 0 | 1 | 210.49 | 1666.71 | 0 | 1.00 | 60.00 | 69.00 | 19200.00 | ▇▁▁▁▁ |
Data.Scores.Aroma | 0 | 1 | 7.57 | 0.40 | 0 | 7.42 | 7.58 | 7.75 | 8.75 | ▁▁▁▁▇ |
Data.Scores.Flavor | 0 | 1 | 7.52 | 0.42 | 0 | 7.33 | 7.50 | 7.75 | 8.83 | ▁▁▁▁▇ |
Data.Scores.Aftertaste | 0 | 1 | 7.39 | 0.43 | 0 | 7.25 | 7.42 | 7.58 | 8.67 | ▁▁▁▁▇ |
Data.Scores.Acidity | 0 | 1 | 7.54 | 0.40 | 0 | 7.33 | 7.58 | 7.75 | 8.75 | ▁▁▁▁▇ |
Data.Scores.Body | 0 | 1 | 7.51 | 0.39 | 0 | 7.33 | 7.50 | 7.67 | 8.50 | ▁▁▁▁▇ |
Data.Scores.Balance | 0 | 1 | 7.50 | 0.43 | 0 | 7.33 | 7.50 | 7.75 | 8.58 | ▁▁▁▁▇ |
Data.Scores.Uniformity | 0 | 1 | 9.82 | 0.59 | 0 | 10.00 | 10.00 | 10.00 | 10.00 | ▁▁▁▁▇ |
Data.Scores.Sweetness | 0 | 1 | 9.83 | 0.69 | 0 | 10.00 | 10.00 | 10.00 | 10.00 | ▁▁▁▁▇ |
Data.Scores.Moisture | 0 | 1 | 0.09 | 0.04 | 0 | 0.10 | 0.11 | 0.12 | 0.28 | ▃▇▆▁▁ |
Data.Scores.Total | 0 | 1 | 81.97 | 3.86 | 0 | 81.08 | 82.50 | 83.58 | 90.58 | ▁▁▁▁▇ |