library(tidyverse)
library(skimr)
An Investigation of Song Popularity
Proposal
Data 1
Introduction and data
Identify the source of the data.
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
This dataset was created in 2017 and the video game playtime information was collected from crowd-sourced data on the “How Long to Beat” web source, which has statistics for various video games regarding the length of time needed to play. This data from “How Long to Beat” seems to be ethically collected because user participation in the website is voluntary.
Write a brief description of the observations.
The observations are individual video games, and the columns are various variables such as specific features, year released, the length of time to play specific parts of the game (average, fastest, slowest), and more.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
1) What makes a video game popular, and how does length of time played relate to a game’s popularity?
2) What is the most popular genre according to release year?
A description of the research topic along with a concise statement of your hypotheses on this topic.
The research topic includes exploring the popularity of video games and also how this correlates to length of time played, as this topic sheds light into gaming tendencies of the 21st century generation. We hypothesize that the most popular games have the longest playing time.
Identify the types of variables in your research question. Categorical? Quantitative?
To determine the popularity of the video game, the quantitative variables of “Metrics.Review Score” and “Metrics.Sales” can be used to evaluate this question. Similarly, regarding length of time played, the quantitative variable of either the mean or median can be used. Other categorical variables such as “Metadata.Genres” can be used to evaluate the popularity in relation to different genres of games, for example.
Glimpse of data
<- read_csv("data/video_games.csv") dataset1
Rows: 1212 Columns: 36
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): Title, Metadata.Genres, Metadata.Publishers, Release.Console, Rele...
dbl (25): Features.Max Players, Metrics.Review Score, Metrics.Sales, Metrics...
lgl (6): Features.Handheld?, Features.Multiplatform?, Features.Online?, Met...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skim(dataset1)
Name | dataset1 |
Number of rows | 1212 |
Number of columns | 36 |
_______________________ | |
Column type frequency: | |
character | 5 |
logical | 6 |
numeric | 25 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Title | 0 | 1.00 | 2 | 52 | 0 | 900 | 0 |
Metadata.Genres | 0 | 1.00 | 6 | 52 | 0 | 48 | 0 |
Metadata.Publishers | 264 | 0.78 | 2 | 20 | 0 | 31 | 0 |
Release.Console | 0 | 1.00 | 4 | 13 | 0 | 5 | 0 |
Release.Rating | 0 | 1.00 | 1 | 1 | 0 | 3 | 0 |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
Features.Handheld? | 0 | 1 | 1 | TRU: 1212 |
Features.Multiplatform? | 0 | 1 | 1 | TRU: 1212 |
Features.Online? | 0 | 1 | 1 | TRU: 1212 |
Metadata.Licensed? | 0 | 1 | 1 | TRU: 1212 |
Metadata.Sequel? | 0 | 1 | 1 | TRU: 1212 |
Release.Re-release? | 0 | 1 | 1 | TRU: 1212 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Features.Max Players | 0 | 1 | 1.66 | 1.20 | 1.00 | 1.00 | 1.00 | 2.00 | 8.00 | ▇▁▁▁▁ |
Metrics.Review Score | 0 | 1 | 68.83 | 12.96 | 19.00 | 60.00 | 70.00 | 79.00 | 98.00 | ▁▂▅▇▂ |
Metrics.Sales | 0 | 1 | 0.50 | 1.07 | 0.01 | 0.09 | 0.21 | 0.46 | 14.66 | ▇▁▁▁▁ |
Metrics.Used Price | 0 | 1 | 17.39 | 5.02 | 4.95 | 14.95 | 17.95 | 17.95 | 49.95 | ▂▇▁▁▁ |
Release.Year | 0 | 1 | 2006.82 | 1.05 | 2004.00 | 2006.00 | 2007.00 | 2008.00 | 2008.00 | ▁▂▅▇▇ |
Length.All PlayStyles.Average | 0 | 1 | 13.65 | 19.40 | 0.00 | 3.56 | 8.86 | 16.03 | 279.73 | ▇▁▁▁▁ |
Length.All PlayStyles.Leisure | 0 | 1 | 26.25 | 51.60 | 0.00 | 4.00 | 12.00 | 27.60 | 476.27 | ▇▁▁▁▁ |
Length.All PlayStyles.Median | 0 | 1 | 11.23 | 13.49 | 0.00 | 3.02 | 8.00 | 13.78 | 126.00 | ▇▁▁▁▁ |
Length.All PlayStyles.Polled | 0 | 1 | 44.42 | 154.84 | 0.00 | 1.00 | 6.00 | 25.00 | 2300.00 | ▇▁▁▁▁ |
Length.All PlayStyles.Rushed | 0 | 1 | 9.40 | 11.18 | 0.00 | 2.60 | 6.71 | 11.37 | 120.20 | ▇▁▁▁▁ |
Length.Completionists.Average | 0 | 1 | 19.81 | 46.63 | 0.00 | 0.00 | 6.00 | 21.55 | 683.13 | ▇▁▁▁▁ |
Length.Completionists.Leisure | 0 | 1 | 25.78 | 61.51 | 0.00 | 0.00 | 6.17 | 27.12 | 691.57 | ▇▁▁▁▁ |
Length.Completionists.Median | 0 | 1 | 18.80 | 44.04 | 0.00 | 0.00 | 6.00 | 20.35 | 683.13 | ▇▁▁▁▁ |
Length.Completionists.Polled | 0 | 1 | 5.66 | 19.70 | 0.00 | 0.00 | 1.00 | 3.00 | 379.00 | ▇▁▁▁▁ |
Length.Completionists.Rushed | 0 | 1 | 16.40 | 40.33 | 0.00 | 0.00 | 5.50 | 18.38 | 674.70 | ▇▁▁▁▁ |
Length.Main + Extras.Average | 0 | 1 | 12.73 | 23.98 | 0.00 | 0.00 | 7.29 | 16.11 | 291.00 | ▇▁▁▁▁ |
Length.Main + Extras.Leisure | 0 | 1 | 18.87 | 42.92 | 0.00 | 0.00 | 8.00 | 21.03 | 478.93 | ▇▁▁▁▁ |
Length.Main + Extras.Median | 0 | 1 | 12.10 | 23.36 | 0.00 | 0.00 | 7.00 | 15.00 | 291.00 | ▇▁▁▁▁ |
Length.Main + Extras.Polled | 0 | 1 | 14.00 | 57.33 | 0.00 | 0.00 | 1.00 | 7.00 | 1100.00 | ▇▁▁▁▁ |
Length.Main + Extras.Rushed | 0 | 1 | 10.32 | 20.90 | 0.00 | 0.00 | 6.28 | 12.94 | 291.00 | ▇▁▁▁▁ |
Length.Main Story.Average | 0 | 1 | 8.47 | 9.69 | 0.00 | 0.00 | 6.57 | 11.03 | 72.38 | ▇▁▁▁▁ |
Length.Main Story.Leisure | 0 | 1 | 11.05 | 14.09 | 0.00 | 0.00 | 8.00 | 14.51 | 135.58 | ▇▁▁▁▁ |
Length.Main Story.Median | 0 | 1 | 8.28 | 9.50 | 0.00 | 0.00 | 6.04 | 10.53 | 70.00 | ▇▁▁▁▁ |
Length.Main Story.Polled | 0 | 1 | 24.88 | 87.38 | 0.00 | 0.00 | 3.00 | 14.00 | 1100.00 | ▇▁▁▁▁ |
Length.Main Story.Rushed | 0 | 1 | 6.97 | 7.96 | 0.00 | 0.00 | 5.34 | 9.31 | 70.00 | ▇▁▁▁▁ |
Data 2
Introduction and data
Identify the source of the data.
Awesome Public Datasets (GitHub): https://github.com/JeffSackmann/tennis_atp/blob/master/atp_matches_2023.csv
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
This data was collected from ATP records by Jeff Sackmann, a GitHub user, in the year 2023, from the start of the year through March 6. In terms of ethics, the ATP records is public information, so this data was ethically collected.
Write a brief description of the observations.
Each observation of this dataset is a match in a tournament, and each match contains variables such as the tournament name, competitor names, as well as statistics concerning shots hit from the matches and the outcomes of the matches. Based on the time length, this dataset contains data relevant to ATP matches through the first two months of 2023.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
1) How is the win percentage of a competitor related to successful first serves?
2) How does the winner age relates to their win percentage of matches?
A description of the research topic along with a concise statement of your hypotheses on this topic.
The research topic includes exploring the relationship between the wins of competitors and their successful first serves, as well as possibly a relationship to player age. We would have to calculate win percentage by each match per tournament. We hypothesize that more successful players make more first serves, and, in relation to age, the most successful players are around the mid-range of ages.
Identify the types of variables in your research question. Categorical? Quantitative?
The winning competitor and tournament names are categorical variables, whereas win percentage and age are quantitative variables.
Glimpse of data
<- read_csv("data/atp_matches_2023.csv") dataset2
Rows: 723 Columns: 49
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (16): tourney_id, tourney_name, surface, tourney_level, winner_seed, win...
dbl (33): draw_size, tourney_date, match_num, winner_id, winner_ht, winner_a...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skim(dataset2)
Name | dataset2 |
Number of rows | 723 |
Number of columns | 49 |
_______________________ | |
Column type frequency: | |
character | 16 |
numeric | 33 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
tourney_id | 0 | 1.00 | 8 | 32 | 0 | 46 | 0 |
tourney_name | 0 | 1.00 | 4 | 28 | 0 | 46 | 0 |
surface | 0 | 1.00 | 4 | 4 | 0 | 2 | 0 |
tourney_level | 0 | 1.00 | 1 | 1 | 0 | 3 | 0 |
winner_seed | 438 | 0.39 | 1 | 2 | 0 | 30 | 0 |
winner_entry | 630 | 0.13 | 1 | 2 | 0 | 5 | 0 |
winner_name | 0 | 1.00 | 8 | 31 | 0 | 198 | 0 |
winner_hand | 0 | 1.00 | 1 | 1 | 0 | 3 | 0 |
winner_ioc | 0 | 1.00 | 3 | 3 | 0 | 60 | 0 |
loser_seed | 529 | 0.27 | 1 | 2 | 0 | 34 | 0 |
loser_entry | 589 | 0.19 | 1 | 2 | 0 | 5 | 0 |
loser_name | 0 | 1.00 | 7 | 31 | 0 | 261 | 0 |
loser_hand | 2 | 1.00 | 1 | 1 | 0 | 3 | 0 |
loser_ioc | 0 | 1.00 | 3 | 3 | 0 | 68 | 0 |
score | 0 | 1.00 | 3 | 29 | 0 | 414 | 0 |
round | 0 | 1.00 | 1 | 4 | 0 | 8 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
draw_size | 0 | 1.00 | 44.22 | 39.86 | 4.0 | 32.00 | 32.0 | 32.00 | 128.0 | ▂▇▁▁▂ |
tourney_date | 0 | 1.00 | 20230172.40 | 51.67 | 20230102.0 | 20230116.00 | 20230204.0 | 20230213.00 | 20230227.0 | ▅▁▁▁▇ |
match_num | 0 | 1.00 | 224.45 | 99.74 | 1.0 | 184.50 | 277.0 | 290.00 | 300.0 | ▂▁▁▁▇ |
winner_id | 0 | 1.00 | 142151.56 | 41421.97 | 100644.0 | 106330.00 | 126203.0 | 200221.00 | 210234.0 | ▇▃▁▁▅ |
winner_ht | 51 | 0.93 | 187.11 | 6.41 | 170.0 | 183.00 | 188.0 | 191.00 | 206.0 | ▁▅▇▃▁ |
winner_age | 0 | 1.00 | 26.62 | 4.31 | 18.0 | 23.90 | 26.1 | 28.70 | 41.4 | ▃▇▃▂▁ |
loser_id | 0 | 1.00 | 140747.63 | 41432.55 | 100644.0 | 106148.00 | 124186.0 | 200221.00 | 212041.0 | ▇▅▁▁▅ |
loser_ht | 94 | 0.87 | 186.06 | 6.40 | 170.0 | 183.00 | 185.0 | 188.00 | 206.0 | ▁▅▇▃▁ |
loser_age | 5 | 0.99 | 27.05 | 4.42 | 17.6 | 24.10 | 26.5 | 29.60 | 43.0 | ▂▇▅▂▁ |
best_of | 0 | 1.00 | 3.35 | 0.76 | 3.0 | 3.00 | 3.0 | 3.00 | 5.0 | ▇▁▁▁▂ |
minutes | 102 | 0.86 | 121.60 | 46.87 | 0.0 | 88.00 | 115.0 | 148.00 | 345.0 | ▁▇▃▁▁ |
w_ace | 101 | 0.86 | 7.68 | 5.59 | 0.0 | 4.00 | 7.0 | 10.00 | 42.0 | ▇▃▁▁▁ |
w_df | 101 | 0.86 | 2.35 | 2.04 | 0.0 | 1.00 | 2.0 | 3.00 | 14.0 | ▇▃▁▁▁ |
w_svpt | 101 | 0.86 | 79.77 | 29.80 | 14.0 | 58.25 | 76.0 | 95.00 | 191.0 | ▂▇▅▁▁ |
w_1stIn | 101 | 0.86 | 51.13 | 20.16 | 8.0 | 37.00 | 48.0 | 61.00 | 128.0 | ▂▇▃▁▁ |
w_1stWon | 101 | 0.86 | 38.96 | 14.64 | 6.0 | 29.00 | 36.0 | 47.00 | 95.0 | ▂▇▅▁▁ |
w_2ndWon | 101 | 0.86 | 15.95 | 6.23 | 2.0 | 12.00 | 15.0 | 19.75 | 37.0 | ▂▇▅▂▁ |
w_SvGms | 101 | 0.86 | 12.89 | 4.36 | 2.0 | 10.00 | 12.0 | 15.00 | 28.0 | ▁▇▆▁▁ |
w_bpSaved | 101 | 0.86 | 3.36 | 3.16 | 0.0 | 1.00 | 3.0 | 5.00 | 22.0 | ▇▂▁▁▁ |
w_bpFaced | 101 | 0.86 | 4.84 | 4.06 | 0.0 | 2.00 | 4.0 | 7.00 | 26.0 | ▇▃▁▁▁ |
l_ace | 101 | 0.86 | 5.80 | 5.59 | 0.0 | 2.00 | 4.0 | 8.00 | 44.0 | ▇▂▁▁▁ |
l_df | 101 | 0.86 | 3.04 | 2.53 | 0.0 | 1.00 | 3.0 | 4.00 | 25.0 | ▇▁▁▁▁ |
l_svpt | 101 | 0.86 | 83.25 | 29.84 | 12.0 | 62.00 | 78.0 | 100.00 | 205.0 | ▂▇▅▁▁ |
l_1stIn | 101 | 0.86 | 52.01 | 19.89 | 7.0 | 37.00 | 48.0 | 63.00 | 143.0 | ▂▇▃▁▁ |
l_1stWon | 101 | 0.86 | 35.02 | 15.13 | 4.0 | 25.00 | 32.0 | 44.00 | 101.0 | ▃▇▃▁▁ |
l_2ndWon | 101 | 0.86 | 14.63 | 6.72 | 1.0 | 10.00 | 14.0 | 19.00 | 38.0 | ▃▇▆▂▁ |
l_SvGms | 101 | 0.86 | 12.67 | 4.30 | 2.0 | 10.00 | 12.0 | 15.00 | 27.0 | ▁▇▆▁▁ |
l_bpSaved | 101 | 0.86 | 4.67 | 3.17 | 0.0 | 2.00 | 4.0 | 6.00 | 17.0 | ▇▆▃▁▁ |
l_bpFaced | 101 | 0.86 | 8.32 | 4.02 | 0.0 | 5.00 | 8.0 | 11.00 | 23.0 | ▂▇▃▂▁ |
winner_rank | 9 | 0.99 | 99.95 | 176.53 | 1.0 | 21.25 | 56.0 | 94.75 | 1594.0 | ▇▁▁▁▁ |
winner_rank_points | 9 | 0.99 | 1425.10 | 1397.77 | 2.0 | 574.25 | 832.5 | 1835.00 | 6980.0 | ▇▂▁▁▁ |
loser_rank | 18 | 0.98 | 148.81 | 243.49 | 1.0 | 42.00 | 75.0 | 129.00 | 1859.0 | ▇▁▁▁▁ |
loser_rank_points | 18 | 0.98 | 929.36 | 971.21 | 1.0 | 435.00 | 695.0 | 971.00 | 6980.0 | ▇▁▁▁▁ |
Data 3
Introduction and data
Identify the source of the data.
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
The data is from a library called the Million Song Dataset. It is a collaboration between Echo Nest and LabROSA (a labratory that works on intelligent machine listening). The dataset was published in 2011. In terms of ethics, this dataset was collected ethically because the statistics and information from artists and songs is publicly-available information.
Write a brief description of the observations.
Each observation in the dataset is a unique song, with a total of 1 million song observations total. Each row has a variable for song popularity and its artists’ popularity.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
1) What is the relationship between artist popularity, the year the song was released, and the artists’ genre?
A description of the research topic along with a concise statement of your hypotheses on this topic.
This research topic explores the popularity of artists, the year they released their songs, and their corresponding genre. We hypothesize that in more recent years, artist popularity generally has increase due to increased accessibility of music. Similarly, different genres of music will peak in different ranges of years based on the current trends of popularity.
Identify the types of variables in your research question. Categorical? Quantitative?
Artist popularity and the year the song was released are quantitative variables. The artist’s genre is a categorical variable.
Glimpse of data
# add code here
<- read_csv("data/music.csv") dataset3
Rows: 10000 Columns: 35
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): artist.id, artist.name, artist.terms, song.id
dbl (31): artist.familiarity, artist.hotttnesss, artist.latitude, artist.loc...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skim(dataset3)
Name | dataset3 |
Number of rows | 10000 |
Number of columns | 35 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 31 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
artist.id | 0 | 1 | 18 | 18 | 0 | 3888 | 0 |
artist.name | 0 | 1 | 1 | 255 | 0 | 4412 | 0 |
artist.terms | 5 | 1 | 2 | 40 | 0 | 458 | 0 |
song.id | 0 | 1 | 18 | 51 | 0 | 10000 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
artist.familiarity | 0 | 1 | 0.57 | 0.16 | 0.00 | 0.47 | 0.56 | 0.67 | 1.00 | ▁▂▇▅▂ |
artist.hotttnesss | 0 | 1 | 0.39 | 0.14 | 0.00 | 0.33 | 0.38 | 0.45 | 1.08 | ▁▇▃▁▁ |
artist.latitude | 0 | 1 | 13.90 | 20.36 | -41.28 | 0.00 | 0.00 | 34.42 | 69.65 | ▁▇▁▃▁ |
artist.location | 0 | 1 | 0.08 | 7.80 | 0.00 | 0.00 | 0.00 | 0.00 | 780.00 | ▇▁▁▁▁ |
artist.longitude | 0 | 1 | -23.92 | 43.72 | -162.44 | -73.95 | 0.00 | 0.00 | 174.77 | ▁▂▇▁▁ |
artist.similar | 0 | 1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ▁▁▇▁▁ |
artist.terms_freq | 0 | 1 | 224.89 | 22392.16 | 0.00 | 0.95 | 1.00 | 1.00 | 2239217.00 | ▇▁▁▁▁ |
release.id | 0 | 1 | 371024.06 | 236777.83 | 0.00 | 172858.00 | 333103.00 | 573532.50 | 823599.00 | ▇▇▅▆▅ |
release.name | 0 | 1 | 23.10 | 1322.90 | 0.00 | 0.00 | 0.00 | 0.00 | 85555.00 | ▇▁▁▁▁ |
song.artist_mbtags | 0 | 1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.33 | ▇▁▁▁▁ |
song.artist_mbtags_count | 0 | 1 | 0.52 | 0.88 | 0.00 | 0.00 | 0.00 | 1.00 | 9.00 | ▇▁▁▁▁ |
song.bars_confidence | 0 | 1 | 0.24 | 0.29 | 0.00 | 0.04 | 0.12 | 0.35 | 8.86 | ▇▁▁▁▁ |
song.bars_start | 0 | 1 | 1.07 | 1.72 | 0.00 | 0.44 | 0.79 | 1.22 | 59.74 | ▇▁▁▁▁ |
song.beats_confidence | 0 | 1 | 0.61 | 0.32 | 0.00 | 0.41 | 0.69 | 0.88 | 1.00 | ▃▂▃▆▇ |
song.beats_start | 0 | 1 | 0.43 | 0.81 | -60.00 | 0.19 | 0.33 | 0.50 | 12.25 | ▁▁▁▁▇ |
song.duration | 0 | 1 | 240.62 | 246.08 | 1.04 | 176.03 | 223.06 | 276.38 | 22050.00 | ▇▁▁▁▁ |
song.end_of_fade_in | 0 | 1 | 0.76 | 1.86 | 0.00 | 0.00 | 0.20 | 0.42 | 43.12 | ▇▁▁▁▁ |
song.hotttnesss | 0 | 1 | -0.24 | 0.69 | -1.00 | -1.00 | 0.00 | 0.41 | 1.00 | ▇▁▃▆▂ |
song.key | 0 | 1 | 5.37 | 9.67 | 0.00 | 2.00 | 5.00 | 8.00 | 904.80 | ▇▁▁▁▁ |
song.key_confidence | 0 | 1 | 0.45 | 0.33 | 0.00 | 0.22 | 0.47 | 0.66 | 19.08 | ▇▁▁▁▁ |
song.loudness | 0 | 1 | -10.48 | 5.40 | -51.64 | -13.16 | -9.38 | -6.53 | 0.57 | ▁▁▁▆▇ |
song.mode | 0 | 1 | 0.69 | 0.46 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▃▁▁▁▇ |
song.mode_confidence | 0 | 1 | 0.48 | 0.19 | 0.00 | 0.36 | 0.49 | 0.61 | 1.00 | ▂▅▇▅▁ |
song.start_of_fade_out | 0 | 1 | 229.88 | 112.02 | -21.39 | 168.86 | 213.86 | 266.27 | 1813.43 | ▇▁▁▁▁ |
song.tatums_confidence | 0 | 1 | 0.51 | 0.33 | 0.00 | 0.24 | 0.50 | 0.77 | 9.23 | ▇▁▁▁▁ |
song.tatums_start | 0 | 1 | 0.30 | 0.51 | 0.00 | 0.11 | 0.19 | 0.29 | 12.25 | ▇▁▁▁▁ |
song.tempo | 0 | 1 | 122.90 | 35.20 | 0.00 | 96.96 | 120.16 | 144.01 | 262.83 | ▁▆▇▂▁ |
song.time_signature | 0 | 1 | 3.56 | 1.27 | 0.00 | 3.00 | 4.00 | 4.00 | 7.00 | ▂▁▇▁▁ |
song.time_signature_confidence | 0 | 1 | 0.60 | 8.99 | 0.00 | 0.10 | 0.55 | 0.86 | 898.89 | ▇▁▁▁▁ |
song.title | 0 | 1 | 10.01 | 945.49 | 0.00 | 0.00 | 0.00 | 0.00 | 94496.00 | ▇▁▁▁▁ |
song.year | 0 | 1 | 934.70 | 996.65 | 0.00 | 0.00 | 0.00 | 2000.00 | 2010.00 | ▇▁▁▁▇ |
dataset3
# A tibble: 10,000 × 35
artist.familiarity artist.hotttnesss artist.id artist.latitude
<dbl> <dbl> <chr> <dbl>
1 0.582 0.402 ARD7TVE1187B99BFB1 0
2 0.631 0.417 ARMJAGH1187FB546F3 35.1
3 0.487 0.343 ARKRRTF1187B9984DA 0
4 0.630 0.454 AR7G5I41187FB4CE6C 0
5 0.651 0.402 ARXR32B1187FB57099 0
6 0.535 0.385 ARKFYS91187B98E58F 0
7 0.556 0.262 ARD0S291187B9B7BF5 0
8 0.801 0.606 AR10USD1187B99F3F1 0
9 0.427 0.332 AR8ZCNI1187B9A069B 0
10 0.551 0.423 ARNTLGG11E2835DDB9 0
# ℹ 9,990 more rows
# ℹ 31 more variables: artist.location <dbl>, artist.longitude <dbl>,
# artist.name <chr>, artist.similar <dbl>, artist.terms <chr>,
# artist.terms_freq <dbl>, release.id <dbl>, release.name <dbl>,
# song.artist_mbtags <dbl>, song.artist_mbtags_count <dbl>,
# song.bars_confidence <dbl>, song.bars_start <dbl>,
# song.beats_confidence <dbl>, song.beats_start <dbl>, song.duration <dbl>, …