Exploring the World of Music

Author

Speedy Coders
Brooke, Akhil, Olivia, Hung, and Catherine

Published

May 5, 2023

Introduce the topic and motivation

We all have a passion for music!
Research question: do the duration of a song, the tempo of a song, artist familiarity, and artist popularity have an association with the popularity of a song?

Introduce the data

The dataset was created to analyze the trend of music listeners and to encourage research on algorithms that scale to commercial sizes and to derive data points from 3,064 songs.
We collected data from Million Song Data Set which is funded by Echo Nest. It is possible that a great portion of our data is only from Spotify.
The observations are songs and the attributes provide information about songs (song.id, song.year, song.tempo, song.hotttnesss) and their correlating artists (artist.id, artist.hotttnesss, artist.familiarity)

Highlights from EDA

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ purrr   1.0.1
✔ tibble  3.2.1     ✔ dplyr   1.1.2
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.3     ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
── Attaching packages ────────────────────────────────────── tidymodels 1.0.0 ──

✔ broom        1.0.2     ✔ rsample      1.1.1
✔ dials        1.1.0     ✔ tune         1.1.1
✔ infer        1.0.4     ✔ workflows    1.1.2
✔ modeldata    1.0.1     ✔ workflowsets 1.0.0
✔ parsnip      1.1.0     ✔ yardstick    1.1.0
✔ recipes      1.0.6     

── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter()   masks stats::filter()
✖ recipes::fixed()  masks stringr::fixed()
✖ dplyr::lag()      masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step()   masks stats::step()
• Use suppressPackageStartupMessages() to eliminate package startup messages

Loading required package: airports

Loading required package: cherryblossom

Loading required package: usdata


Attaching package: 'openintro'


The following object is masked from 'package:modeldata':

    ames


New names:
Rows: 2693 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): artist.name, artist.id, artist.terms, song.id, genre
dbl (7): ...1, artist.familiarity, artist.hotttnesss, song.hotttnesss, song....

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

Hypothesis Testing

We want to examine the relationship between song tempo and popularity
Test for significant difference between the proportion of songs with hotness greater than 0.8 inside and outside 100-140 bpm tempo.
We chose this range because this is the average tempo range of a song, and we think that 0.8 is the threshold for popularity.
p-value is 0.136. The data does not provide evidence of a significant difference in popularity of songs with tempo within and outside 100-140 bpm.

Hypothesis Testing

# A tibble: 1 × 1
  p_value
    <dbl>
1   0.132

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1 -0.00160   0.0256

The hottest songs are usually 240 seconds long
Song tempo and duration don’t have a significant effect on song popularity
Artist popularity and familiarity do have a relationship with song popularity
Artists and music labels should focus on building their popularity and retention listening rates to increase their song popularity.