Project proposal

Author

proud-seal (Max Savona, Morgan Stuart, Kamran Murray)

library(tidyverse)

billboard_df <- read.csv("data/billboard.csv")

Dataset

rows <- nrow(billboard_df)
cols <- ncol(billboard_df)

Dataset - Billboard Hot 100 Number Ones

This dataset is originally from Billboard, who charts music data weekly based on sales, streaming, and radio broadcasting. From there it was curated by TidyTuesday as one of 2025’s weekly datasets. The user Jen Richmond was the specific curator for this dataset.

Current dimensions: [1177 rows, 105 columns]

We chose this dataset as our group members all have fond relationships with music, and we wish to explore the trends of the music industry because of this. Using this as an opportunity to engage with a media platform we all are passionate about will increase our committment to results throughout this project. The analysis we execute will also expand our individual understandings of music trends, benefitting us in the process.

Questions

Question 1: Has it become easier or harder to dominate the Hot 100 over time? For this project, we define “dominate” as the number of weeks a song spends at number one. Question 2: Do certain genres last longer at number one than others? Here, “last longer” refers specifically to the number of weeks a song remains at number one.

Analysis plan

For Question 1, we plan on using the included ‘year’, ‘weeks_at_number_one’, ‘non_consecutive’, ‘song’, and ‘artist’ variables. With these variables (key variables being ‘weeks_at_number_one’ and ‘year’), we will identify whether today’s songs tend to persist at number one for longer or shorter periods of time than songs in the past. We will analyze the by-year average number of weeks at number one and compare them across time. To account for potential outliers (for example, one song dominating a year), we will also examine the median and overall distribution to ensure that trends are not driven by extreme cases. For Question 2, we plan on using the included ‘year’, ‘weeks_at_number_one’, ‘cdr_genre’, ‘cdr_style’, ‘song’, and ‘artist’ variables from billboard.csv, and potentially incorporating additional genre-related classifications from topics.csv if helpful. With these variables (key variables being ‘weeks_at_number_one’ and ‘cdr_genre’), we will analyze whether certain genres tend to remain at number one longer than others. We will compare genres based on the average and median number of weeks at number one and will report the number of songs within each genre to avoid misleading conclusions driven by small sample sizes. We recognize that Billboard chart methodology and genre classifications have evolved over time, which may affect comparisons across decades. These factors will be considered when interpreting results.