library(tidyverse)
library(skimr)
How Song Characteristics can Affect Song Popularity
Proposal
Data 1
Introduction and data
Source of data is from CORGIS dataset project: https://think.cs.vt.edu/corgis/csv/coffee/
The data was created by Sam Donald in 10/28/2022, taking coffee testers from many countries who professionally rated the coffee on a 0-100 scale, noting acidity, sweetness, etc…
Different variables for the location of where coffee was produced, total coffee scores, and the coffee characteristics that make up the coffee score based on acidity, sweetness, etc…
Research question
- In different countries, how does the flavor, acidity, and aroma of coffee compare?
- As college students, we drink a large amount of coffee to make up for pulling all-nighters. Understanding the factors that contribute to the appeal of coffee is important. This is good information for farmers to know so that they can grow more popular flavors of coffee.
- Our research topic wants to explore the variation of flavors of coffee and how they relate to coffee production across the world. We hypothesize that countries in the Americas have sweeter coffee.
- Region and coffee species is categorical, while flavor data is all a float and is numerical.
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
- A description of the research topic along with a concise statement of your hypotheses on this topic.
- Identify the types of variables in your research question. Categorical? Quantitative?
Glimpse of data
# add code here
<-read.csv("data/coffee.csv") |>
coffee::skim()
skimr coffee
Name | read.csv(“data/coffee.csv… |
Number of rows | 989 |
Number of columns | 23 |
_______________________ | |
Column type frequency: | |
character | 7 |
numeric | 16 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Location.Country | 0 | 1 | 4 | 28 | 0 | 32 | 0 |
Location.Region | 0 | 1 | 3 | 76 | 0 | 278 | 0 |
Data.Owner | 0 | 1 | 3 | 50 | 0 | 263 | 0 |
Data.Type.Species | 0 | 1 | 7 | 7 | 0 | 2 | 0 |
Data.Type.Variety | 0 | 1 | 3 | 21 | 0 | 28 | 0 |
Data.Type.Processing.method | 0 | 1 | 3 | 25 | 0 | 6 | 0 |
Data.Color | 0 | 1 | 4 | 12 | 0 | 5 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Location.Altitude.Min | 0 | 1 | 1640.08 | 9192.52 | 0 | 905.00 | 1300.00 | 1550.00 | 190164.00 | ▇▁▁▁▁ |
Location.Altitude.Max | 0 | 1 | 1675.93 | 9191.96 | 0 | 950.00 | 1310.00 | 1600.00 | 190164.00 | ▇▁▁▁▁ |
Location.Altitude.Average | 0 | 1 | 1658.00 | 9192.06 | 0 | 950.00 | 1300.00 | 1600.00 | 190164.00 | ▇▁▁▁▁ |
Year | 0 | 1 | 2013.55 | 1.66 | 2010 | 2012.00 | 2013.00 | 2015.00 | 2018.00 | ▁▇▃▃▁ |
Data.Production.Number.of.bags | 0 | 1 | 151.76 | 125.67 | 1 | 15.00 | 170.00 | 275.00 | 600.00 | ▇▁▇▁▁ |
Data.Production.Bag.weight | 0 | 1 | 210.49 | 1666.71 | 0 | 1.00 | 60.00 | 69.00 | 19200.00 | ▇▁▁▁▁ |
Data.Scores.Aroma | 0 | 1 | 7.57 | 0.40 | 0 | 7.42 | 7.58 | 7.75 | 8.75 | ▁▁▁▁▇ |
Data.Scores.Flavor | 0 | 1 | 7.52 | 0.42 | 0 | 7.33 | 7.50 | 7.75 | 8.83 | ▁▁▁▁▇ |
Data.Scores.Aftertaste | 0 | 1 | 7.39 | 0.43 | 0 | 7.25 | 7.42 | 7.58 | 8.67 | ▁▁▁▁▇ |
Data.Scores.Acidity | 0 | 1 | 7.54 | 0.40 | 0 | 7.33 | 7.58 | 7.75 | 8.75 | ▁▁▁▁▇ |
Data.Scores.Body | 0 | 1 | 7.51 | 0.39 | 0 | 7.33 | 7.50 | 7.67 | 8.50 | ▁▁▁▁▇ |
Data.Scores.Balance | 0 | 1 | 7.50 | 0.43 | 0 | 7.33 | 7.50 | 7.75 | 8.58 | ▁▁▁▁▇ |
Data.Scores.Uniformity | 0 | 1 | 9.82 | 0.59 | 0 | 10.00 | 10.00 | 10.00 | 10.00 | ▁▁▁▁▇ |
Data.Scores.Sweetness | 0 | 1 | 9.83 | 0.69 | 0 | 10.00 | 10.00 | 10.00 | 10.00 | ▁▁▁▁▇ |
Data.Scores.Moisture | 0 | 1 | 0.09 | 0.04 | 0 | 0.10 | 0.11 | 0.12 | 0.28 | ▃▇▆▁▁ |
Data.Scores.Total | 0 | 1 | 81.97 | 3.86 | 0 | 81.08 | 82.50 | 83.58 | 90.58 | ▁▁▁▁▇ |
Data 2
Introduction and data
Source of data is from CORGIS dataset project: https://think.cs.vt.edu/corgis/csv/music/. We believe this data was ethically sourced.
The data is from The Million Song Dataset created in 2011.
The dataset contains information on artists (e.g. location, demographics, and popularity) and information on the artist’s respective songs (e.g. title, year, length, tempo, bpm, etc.)
Research question
- How do song characteristics (e.g. loudness, tempo, length) relate to song and artist popularity?
- College students are one of the largest demographics for listening to music. We are interested in researching the relationship between song characteristics and popularity and whether or not some aspects of songs or of the artist are more or less influential in this respect.
- Artist name, artist location, and song title are categorical. Song length, song hotttnesss, and artist hotttnesss are quantitative. We hypothesize that American artists with shorter songs that have a higher bpm have the highest song and artist hotttnesss score.
Glimpse of data
# add code here
<-read.csv("data/music.csv") |>
music::skim()
skimr music
Name | read.csv(“data/music.csv”… |
Number of rows | 10000 |
Number of columns | 35 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 31 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
artist.id | 0 | 1 | 18 | 18 | 0 | 3888 | 0 |
artist.name | 0 | 1 | 1 | 255 | 0 | 4412 | 0 |
artist.terms | 0 | 1 | 0 | 40 | 5 | 459 | 0 |
song.id | 0 | 1 | 18 | 52 | 0 | 10000 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
artist.familiarity | 0 | 1 | 0.57 | 0.16 | 0.00 | 0.47 | 0.56 | 0.67 | 1.00 | ▁▂▇▅▂ |
artist.hotttnesss | 0 | 1 | 0.39 | 0.14 | 0.00 | 0.33 | 0.38 | 0.45 | 1.08 | ▁▇▃▁▁ |
artist.latitude | 0 | 1 | 13.90 | 20.36 | -41.28 | 0.00 | 0.00 | 34.42 | 69.65 | ▁▇▁▃▁ |
artist.location | 0 | 1 | 0.08 | 7.80 | 0.00 | 0.00 | 0.00 | 0.00 | 780.00 | ▇▁▁▁▁ |
artist.longitude | 0 | 1 | -23.92 | 43.72 | -162.44 | -73.95 | 0.00 | 0.00 | 174.77 | ▁▂▇▁▁ |
artist.similar | 0 | 1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ▁▁▇▁▁ |
artist.terms_freq | 0 | 1 | 224.89 | 22392.16 | 0.00 | 0.95 | 1.00 | 1.00 | 2239217.00 | ▇▁▁▁▁ |
release.id | 0 | 1 | 371024.06 | 236777.83 | 0.00 | 172858.00 | 333103.00 | 573532.50 | 823599.00 | ▇▇▅▆▅ |
release.name | 0 | 1 | 23.10 | 1322.90 | 0.00 | 0.00 | 0.00 | 0.00 | 85555.00 | ▇▁▁▁▁ |
song.artist_mbtags | 0 | 1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.33 | ▇▁▁▁▁ |
song.artist_mbtags_count | 0 | 1 | 0.52 | 0.88 | 0.00 | 0.00 | 0.00 | 1.00 | 9.00 | ▇▁▁▁▁ |
song.bars_confidence | 0 | 1 | 0.24 | 0.29 | 0.00 | 0.04 | 0.12 | 0.35 | 8.86 | ▇▁▁▁▁ |
song.bars_start | 0 | 1 | 1.07 | 1.72 | 0.00 | 0.44 | 0.79 | 1.22 | 59.74 | ▇▁▁▁▁ |
song.beats_confidence | 0 | 1 | 0.61 | 0.32 | 0.00 | 0.41 | 0.69 | 0.88 | 1.00 | ▃▂▃▆▇ |
song.beats_start | 0 | 1 | 0.43 | 0.81 | -60.00 | 0.19 | 0.33 | 0.50 | 12.25 | ▁▁▁▁▇ |
song.duration | 0 | 1 | 240.62 | 246.08 | 1.04 | 176.03 | 223.06 | 276.38 | 22050.00 | ▇▁▁▁▁ |
song.end_of_fade_in | 0 | 1 | 0.76 | 1.86 | 0.00 | 0.00 | 0.20 | 0.42 | 43.12 | ▇▁▁▁▁ |
song.hotttnesss | 0 | 1 | -0.24 | 0.69 | -1.00 | -1.00 | 0.00 | 0.41 | 1.00 | ▇▁▃▆▂ |
song.key | 0 | 1 | 5.37 | 9.67 | 0.00 | 2.00 | 5.00 | 8.00 | 904.80 | ▇▁▁▁▁ |
song.key_confidence | 0 | 1 | 0.45 | 0.33 | 0.00 | 0.22 | 0.47 | 0.66 | 19.08 | ▇▁▁▁▁ |
song.loudness | 0 | 1 | -10.48 | 5.40 | -51.64 | -13.16 | -9.38 | -6.53 | 0.57 | ▁▁▁▆▇ |
song.mode | 0 | 1 | 0.69 | 0.46 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▃▁▁▁▇ |
song.mode_confidence | 0 | 1 | 0.48 | 0.19 | 0.00 | 0.36 | 0.49 | 0.61 | 1.00 | ▂▅▇▅▁ |
song.start_of_fade_out | 0 | 1 | 229.88 | 112.02 | -21.39 | 168.86 | 213.86 | 266.27 | 1813.43 | ▇▁▁▁▁ |
song.tatums_confidence | 0 | 1 | 0.51 | 0.33 | 0.00 | 0.24 | 0.50 | 0.77 | 9.23 | ▇▁▁▁▁ |
song.tatums_start | 0 | 1 | 0.30 | 0.51 | 0.00 | 0.11 | 0.19 | 0.29 | 12.25 | ▇▁▁▁▁ |
song.tempo | 0 | 1 | 122.90 | 35.20 | 0.00 | 96.96 | 120.16 | 144.01 | 262.83 | ▁▆▇▂▁ |
song.time_signature | 0 | 1 | 3.56 | 1.27 | 0.00 | 3.00 | 4.00 | 4.00 | 7.00 | ▂▁▇▁▁ |
song.time_signature_confidence | 0 | 1 | 0.60 | 8.99 | 0.00 | 0.10 | 0.55 | 0.86 | 898.89 | ▇▁▁▁▁ |
song.title | 0 | 1 | 10.01 | 945.49 | 0.00 | 0.00 | 0.00 | 0.00 | 94496.00 | ▇▁▁▁▁ |
song.year | 0 | 1 | 934.70 | 996.65 | 0.00 | 0.00 | 0.00 | 2000.00 | 2010.00 | ▇▁▁▁▇ |
Data 3
Introduction and data
Source of the data is from CORGIS Dataset Project on Billionaires:
This data was collected from the Forbes World’s Billionaires lists from 1996-2014, with additional data added by scholars at Peterson Institute for International Economics.
The data set shows various billionaires and information about their company, their demographics, and how they obtained their wealth.
Research question
- Our research question: How does the source and growth of billionaires’ wealth relate to billionaires’ rankings?
- We are researching the origin of billionaires’ wealth (how it was inherited, where the money came from, etc.) and understanding how these factors may have impacts on billionaire rankings. This is an important research topic because many of these billionaires in the data set have economic impacts on companies and markets that individuals interact with on a daily basis as well as country GDPs. We hypothesize that inherited wealth has a high correlation with high billionaire rankings.
- Billionaire names (categorical) and rankings (quantitative) are strings and integers, while the variables relating to wealth origin are booleans and strings, which are categorical variables.
Glimpse of data
# add code here
<-read.csv("data/billionaires.csv") |>
billionaires::skim()
skimr billionaires
Name | read.csv(“data/billionair… |
Number of rows | 2614 |
Number of columns | 22 |
_______________________ | |
Column type frequency: | |
character | 16 |
numeric | 6 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 5 | 45 | 0 | 2077 | 0 |
company.name | 0 | 1 | 0 | 59 | 38 | 1578 | 0 |
company.relationship | 0 | 1 | 0 | 46 | 46 | 75 | 0 |
company.sector | 0 | 1 | 0 | 52 | 23 | 521 | 0 |
company.type | 0 | 1 | 0 | 22 | 36 | 19 | 0 |
demographics.gender | 0 | 1 | 0 | 14 | 34 | 4 | 0 |
location.citizenship | 0 | 1 | 4 | 20 | 0 | 73 | 0 |
location.country.code | 0 | 1 | 3 | 6 | 0 | 74 | 0 |
location.region | 0 | 1 | 1 | 24 | 0 | 8 | 0 |
wealth.type | 0 | 1 | 0 | 24 | 22 | 6 | 0 |
wealth.how.category | 0 | 1 | 0 | 18 | 1 | 10 | 0 |
wealth.how.from.emerging | 0 | 1 | 4 | 4 | 0 | 1 | 0 |
wealth.how.industry | 0 | 1 | 0 | 31 | 1 | 20 | 0 |
wealth.how.inherited | 0 | 1 | 6 | 24 | 0 | 6 | 0 |
wealth.how.was.founder | 0 | 1 | 4 | 4 | 0 | 1 | 0 |
wealth.how.was.political | 0 | 1 | 4 | 4 | 0 | 1 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
rank | 0 | 1 | 5.996700e+02 | 4.678900e+02 | 1 | 215.0 | 430 | 9.880e+02 | 1.565e+03 | ▇▅▃▂▃ |
year | 0 | 1 | 2.008410e+03 | 7.480000e+00 | 1996 | 2001.0 | 2014 | 2.014e+03 | 2.014e+03 | ▂▂▁▁▇ |
company.founded | 0 | 1 | 1.924710e+03 | 2.437800e+02 | 0 | 1936.0 | 1963 | 1.985e+03 | 2.012e+03 | ▁▁▁▁▇ |
demographics.age | 0 | 1 | 5.334000e+01 | 2.533000e+01 | -42 | 47.0 | 59 | 7.000e+01 | 9.800e+01 | ▁▂▁▇▃ |
location.gdp | 0 | 1 | 1.769103e+12 | 3.547083e+12 | 0 | 0.0 | 0 | 7.250e+11 | 1.060e+13 | ▇▁▁▁▁ |
wealth.worth.in.billions | 0 | 1 | 3.530000e+00 | 5.090000e+00 | 1 | 1.4 | 2 | 3.500e+00 | 7.600e+01 | ▇▁▁▁▁ |