Harmonizing with Data

Exploring music through data science

Skillful Starmie
Jaden O’Brien, Matt Chin,
Julia Beitel, Bella Besuud, and Clara Lee

4/5/23

Introduce the topic and motivation

Topic and Motivation:

  • See if there are any patterns behind a song or artists’ popularity. Is there something specific that causes one song or artist to be more popular than another? What causes the ubiquity of certain songs?

Research Question:

  • How do song characteristics (e.g. loudness, tempo, length, etc.) relate to song and artist popularity?

Introduce the data

  • The Million Song Dataset (2011)
  • Contains information on:
    • Artists (e.g. location, demographics, and popularity)
    • The artist’s respective songs (e.g. title, year, length, tempo, bpm, hotness etc.)
  • Many variables, like song length and loudness, comes from objective statistics about songs, and therefore are unlikely to be biased or influenced by different processes. Song and artist hotness, however, are likely to be influenced by 2010 social trends.

Highlights from EDA

Inference/modeling/other analysis

There was very little correlation between song tempo and song popularity, but we noticed a lot of variation when grouping by song genre.

Significance Testing

Evaluating the effect of song tempo and song duration on popularity

Conclusions + future work

Significance tests:
  • True mean popularity of fast songs =/ the true mean popularity of slow songs

  • No sufficient evidence to disprove that the popularity of songs differs by the length of the song

Future work:
  • Informative conclusions for making a song with most hotness and popularity

    • Investigating song hotness with harmony, streaming platform exposure, and more!

    • Expanding the dataset to include a wider variety of genres and artists