library(tidyverse)
library(skimr)
Project title
Proposal
Data 1
Introduction and data
Identify the source of the data.
https://gist.github.com/armgilles/194bcff35001e7eb53a2a8b441e8b2c6
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
The original data curator most likely collected this data by organizing every known Pokemon up to the 6th generation pokedex.
Write a brief description of the observations.
This csv file includes every Pokemon including mega evolution up to the 6th generation (X & Y). It includes the Pokemon’s names, stats, generation, type, and whether they are legendary or not.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
How does the generation of Pokemon affect their stats and whether they are legendary or not
A description of the research topic along with a concise statement of your hypotheses on this topic.
I think the later on the generations, there will be a higher amount of legendary Pokemon which as a result will cause higher overall stats since Lengedaries typically have higher stats. I think this because the later on generations typically have more legendary Pokemons and third evolution Pokemon which generally have higher stats than only two evolution or no evolution Pokemons.
Identify the types of variables in your research question. Categorical? Quantitative?
Name - Categorical
Legendary or Not - Categorical
Generation - Categorical
Type - Categorical
Stats of Pokemon - Quantitative
Glimpse of data
# add code here
<- read_csv("data/pokemon.csv") pokemon
Rows: 800 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Name, Type 1, Type 2
dbl (9): #, Total, HP, Attack, Defense, Sp. Atk, Sp. Def, Speed, Generation
lgl (1): Legendary
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(pokemon)
Rows: 800
Columns: 13
$ `#` <dbl> 1, 2, 3, 3, 4, 5, 6, 6, 6, 7, 8, 9, 9, 10, 11, 12, 13, 14, …
$ Name <chr> "Bulbasaur", "Ivysaur", "Venusaur", "VenusaurMega Venusaur"…
$ `Type 1` <chr> "Grass", "Grass", "Grass", "Grass", "Fire", "Fire", "Fire",…
$ `Type 2` <chr> "Poison", "Poison", "Poison", "Poison", NA, NA, "Flying", "…
$ Total <dbl> 318, 405, 525, 625, 309, 405, 534, 634, 634, 314, 405, 530,…
$ HP <dbl> 45, 60, 80, 80, 39, 58, 78, 78, 78, 44, 59, 79, 79, 45, 50,…
$ Attack <dbl> 49, 62, 82, 100, 52, 64, 84, 130, 104, 48, 63, 83, 103, 30,…
$ Defense <dbl> 49, 63, 83, 123, 43, 58, 78, 111, 78, 65, 80, 100, 120, 35,…
$ `Sp. Atk` <dbl> 65, 80, 100, 122, 60, 80, 109, 130, 159, 50, 65, 85, 135, 2…
$ `Sp. Def` <dbl> 65, 80, 100, 120, 50, 65, 85, 85, 115, 64, 80, 105, 115, 20…
$ Speed <dbl> 45, 60, 80, 80, 65, 80, 100, 100, 100, 43, 58, 78, 78, 45, …
$ Generation <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ Legendary <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
::skim(pokemon) skimr
Name | pokemon |
Number of rows | 800 |
Number of columns | 13 |
_______________________ | |
Column type frequency: | |
character | 3 |
logical | 1 |
numeric | 9 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Name | 0 | 1.00 | 3 | 25 | 0 | 800 | 0 |
Type 1 | 0 | 1.00 | 3 | 8 | 0 | 18 | 0 |
Type 2 | 386 | 0.52 | 3 | 8 | 0 | 18 | 0 |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
Legendary | 0 | 1 | 0.08 | FAL: 735, TRU: 65 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
# | 0 | 1 | 362.81 | 208.34 | 1 | 184.75 | 364.5 | 539.25 | 721 | ▇▇▇▇▇ |
Total | 0 | 1 | 435.10 | 119.96 | 180 | 330.00 | 450.0 | 515.00 | 780 | ▃▆▇▂▁ |
HP | 0 | 1 | 69.26 | 25.53 | 1 | 50.00 | 65.0 | 80.00 | 255 | ▃▇▁▁▁ |
Attack | 0 | 1 | 79.00 | 32.46 | 5 | 55.00 | 75.0 | 100.00 | 190 | ▂▇▆▂▁ |
Defense | 0 | 1 | 73.84 | 31.18 | 5 | 50.00 | 70.0 | 90.00 | 230 | ▃▇▂▁▁ |
Sp. Atk | 0 | 1 | 72.82 | 32.72 | 10 | 49.75 | 65.0 | 95.00 | 194 | ▅▇▅▂▁ |
Sp. Def | 0 | 1 | 71.90 | 27.83 | 20 | 50.00 | 70.0 | 90.00 | 230 | ▇▇▂▁▁ |
Speed | 0 | 1 | 68.28 | 29.06 | 5 | 45.00 | 65.0 | 90.00 | 180 | ▃▇▆▁▁ |
Generation | 0 | 1 | 3.32 | 1.66 | 1 | 2.00 | 3.0 | 5.00 | 6 | ▇▅▃▅▂ |
pokemon
# A tibble: 800 × 13
`#` Name `Type 1` `Type 2` Total HP Attack Defense `Sp. Atk` `Sp. Def`
<dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 Bulba… Grass Poison 318 45 49 49 65 65
2 2 Ivysa… Grass Poison 405 60 62 63 80 80
3 3 Venus… Grass Poison 525 80 82 83 100 100
4 3 Venus… Grass Poison 625 80 100 123 122 120
5 4 Charm… Fire <NA> 309 39 52 43 60 50
6 5 Charm… Fire <NA> 405 58 64 58 80 65
7 6 Chari… Fire Flying 534 78 84 78 109 85
8 6 Chari… Fire Dragon 634 78 130 111 130 85
9 6 Chari… Fire Flying 634 78 104 78 159 115
10 7 Squir… Water <NA> 314 44 48 65 50 64
# ℹ 790 more rows
# ℹ 3 more variables: Speed <dbl>, Generation <dbl>, Legendary <lgl>
Data 2
Introduction and data
Identify the source of the data.
https://www.kaggle.com/datasets/ibriiee/video-games-sales-dataset-2022-updated-extra-feat?resource=download
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
The data was likely found by scraping data from gaming website which showcases the number of sales they have per game.
Write a brief description of the observations.
The data contains the names of various video games, their platform, genre, year-of-release, and the number of sales per region.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
- How does the popularity and sales of video games vary across different genres, platforms, and regions? Is there a preference in certain regions for a certain genre or platform?
A description of the research topic along with a concise statement of your hypotheses on this topic.
I believe that it’s likely that in North America and Europe, people have a preference for action games and personal computers, but in Japan consoles and role-playing games may be more popular.
Identify the types of variables in your research question. Categorical? Quantitative?
Name - Categorical
Platform - Categorical
Genre - Categorical
NA_Sales - Quantitative
EU_Sales - Quantitative
JP_Sales - Quantitative
Glimpse of data
<- read_csv("data/video_games.csv") video_games
Rows: 16719 Columns: 16
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): Name, Platform, Year_of_Release, Genre, Publisher, User_Score, Deve...
dbl (8): NA_Sales, EU_Sales, JP_Sales, Other_Sales, Global_Sales, Critic_Sco...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(video_games)
Rows: 16,719
Columns: 16
$ Name <chr> "Wii Sports", "Super Mario Bros.", "Mario Kart Wii", "…
$ Platform <chr> "Wii", "NES", "Wii", "Wii", "GB", "GB", "DS", "Wii", "…
$ Year_of_Release <chr> "2006", "1985", "2008", "2009", "1996", "1989", "2006"…
$ Genre <chr> "Sports", "Platform", "Racing", "Sports", "Role-Playin…
$ Publisher <chr> "Nintendo", "Nintendo", "Nintendo", "Nintendo", "Ninte…
$ NA_Sales <dbl> 41.36, 29.08, 15.68, 15.61, 11.27, 23.20, 11.28, 13.96…
$ EU_Sales <dbl> 28.96, 3.58, 12.76, 10.93, 8.89, 2.26, 9.14, 9.18, 6.9…
$ JP_Sales <dbl> 3.77, 6.81, 3.79, 3.28, 10.22, 4.22, 6.50, 2.93, 4.70,…
$ Other_Sales <dbl> 8.45, 0.77, 3.29, 2.95, 1.00, 0.58, 2.88, 2.84, 2.24, …
$ Global_Sales <dbl> 82.53, 40.24, 35.52, 32.77, 31.37, 30.26, 29.80, 28.92…
$ Critic_Score <dbl> 76, NA, 82, 80, NA, NA, 89, 58, 87, NA, NA, 91, NA, 80…
$ Critic_Count <dbl> 51, NA, 73, 73, NA, NA, 65, 41, 80, NA, NA, 64, NA, 63…
$ User_Score <chr> "8", NA, "8.3", "8", NA, NA, "8.5", "6.6", "8.4", NA, …
$ User_Count <dbl> 322, NA, 709, 192, NA, NA, 431, 129, 594, NA, NA, 464,…
$ Developer <chr> "Nintendo", NA, "Nintendo", "Nintendo", NA, NA, "Ninte…
$ Rating <chr> "E", NA, "E", "E", NA, NA, "E", "E", "E", NA, NA, "E",…
::skim(video_games) skimr
Name | video_games |
Number of rows | 16719 |
Number of columns | 16 |
_______________________ | |
Column type frequency: | |
character | 8 |
numeric | 8 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Name | 2 | 1.0 | 1 | 132 | 0 | 11562 | 0 |
Platform | 0 | 1.0 | 2 | 4 | 0 | 31 | 0 |
Year_of_Release | 0 | 1.0 | 3 | 4 | 0 | 40 | 0 |
Genre | 2 | 1.0 | 4 | 12 | 0 | 12 | 0 |
Publisher | 0 | 1.0 | 3 | 38 | 0 | 581 | 0 |
User_Score | 6704 | 0.6 | 1 | 3 | 0 | 96 | 0 |
Developer | 6623 | 0.6 | 2 | 80 | 0 | 1696 | 0 |
Rating | 6769 | 0.6 | 1 | 4 | 0 | 8 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
NA_Sales | 0 | 1.00 | 0.26 | 0.81 | 0.00 | 0.00 | 0.08 | 0.24 | 41.36 | ▇▁▁▁▁ |
EU_Sales | 0 | 1.00 | 0.15 | 0.50 | 0.00 | 0.00 | 0.02 | 0.11 | 28.96 | ▇▁▁▁▁ |
JP_Sales | 0 | 1.00 | 0.08 | 0.31 | 0.00 | 0.00 | 0.00 | 0.04 | 10.22 | ▇▁▁▁▁ |
Other_Sales | 0 | 1.00 | 0.05 | 0.19 | 0.00 | 0.00 | 0.01 | 0.03 | 10.57 | ▇▁▁▁▁ |
Global_Sales | 0 | 1.00 | 0.53 | 1.55 | 0.01 | 0.06 | 0.17 | 0.47 | 82.53 | ▇▁▁▁▁ |
Critic_Score | 8582 | 0.49 | 68.97 | 13.94 | 13.00 | 60.00 | 71.00 | 79.00 | 98.00 | ▁▁▅▇▃ |
Critic_Count | 8582 | 0.49 | 26.36 | 18.98 | 3.00 | 12.00 | 21.00 | 36.00 | 113.00 | ▇▃▂▁▁ |
User_Count | 9129 | 0.45 | 162.23 | 561.28 | 4.00 | 10.00 | 24.00 | 81.00 | 10665.00 | ▇▁▁▁▁ |
video_games
# A tibble: 16,719 × 16
Name Platform Year_of_Release Genre Publisher NA_Sales EU_Sales JP_Sales
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Wii Spor… Wii 2006 Spor… Nintendo 41.4 29.0 3.77
2 Super Ma… NES 1985 Plat… Nintendo 29.1 3.58 6.81
3 Mario Ka… Wii 2008 Raci… Nintendo 15.7 12.8 3.79
4 Wii Spor… Wii 2009 Spor… Nintendo 15.6 10.9 3.28
5 Pokemon … GB 1996 Role… Nintendo 11.3 8.89 10.2
6 Tetris GB 1989 Puzz… Nintendo 23.2 2.26 4.22
7 New Supe… DS 2006 Plat… Nintendo 11.3 9.14 6.5
8 Wii Play Wii 2006 Misc Nintendo 14.0 9.18 2.93
9 New Supe… Wii 2009 Plat… Nintendo 14.4 6.94 4.7
10 Duck Hunt NES 1984 Shoo… Nintendo 26.9 0.63 0.28
# ℹ 16,709 more rows
# ℹ 8 more variables: Other_Sales <dbl>, Global_Sales <dbl>,
# Critic_Score <dbl>, Critic_Count <dbl>, User_Score <chr>, User_Count <dbl>,
# Developer <chr>, Rating <chr>
Data 3
Introduction and data
- Identify the source of the data.
https://think.cs.vt.edu/corgis/csv/billionaires/
From the CORGIS Dataset Project
- State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
The data builds off data originally collect by Forbes on the Forbes World’s Billionaires lists from 1996-2014. Scholars at the Peterson Institute for International Economics added additional variables for each billionaire that revealed important information.
- Write a brief description of the observations.
The data contains information about different billionaires. There are variables that contain information about the personal information of billionaires such as their names, their ages, their gender and location of citizenship. There are also questions that pertain more to their professional side such as the companies each of them own or work for, their position in the company, the year the company was founded, and the industry their company operates in. Finally, there are variables pertaining to their wealth such as total wealth, wealth type and wealth category.
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
How does the age of billionaires relate to their wealth and the industry that they are in?
What are the differences in total wealth between male and female billionaires across different industries and demographic age groups?
- A description of the research topic along with a concise statement of your hypotheses on this topic.
For the first proposed research question, we want to examine whether the age of a billionaire can provide insights into what industries they work in as well as their total wealth. We think age does have a correlation with certain types of industry as well as total wealth. We hypothesize this since the ways of making a lot of money has changed throughout history, for example making money through the technology sector may not have been as prevalent 100 years compared to today. Additionally, we hypothesize that older billionaires would most likely have more total wealth since they have had more time to make money.
For the second proposed research question, the focus is how the genders of billionaires relate to their total wealth, the sectors they work in as well as their age groups. We hypothesize that female billionaires will most likely have less total wealth than male ones. Additionally, we think that there would be much fewer female billionaires in male dominated industries such as money management and real estate.
- Identify the types of variables in your research question. Categorical? Quantitative?
Name - Categorical
Rank - Categorical
Company Founded - Quantitative
Company Sector - Categorical
Demographic Age - Quantitative
Gender - Categorical
Location.Citizenship - Categorical
Wealth.Type - Categorical
Wealth.worth in Billions - Quantitative
Wealth.how.industry - Categorical
Glimpse of data
# add code here
<- read_csv("data/forbes_billionaires.csv") billionaire
Rows: 2755 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): Name, Country, Source, Residence, Citizenship, Status, Education
dbl (4): NetWorth, Rank, Age, Children
lgl (1): Self_made
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(billionaire)
Rows: 2,755
Columns: 12
$ Name <chr> "Jeff Bezos", "Elon Musk", "Bernard Arnault & family", "Bi…
$ NetWorth <dbl> 177.0, 151.0, 150.0, 124.0, 97.0, 96.0, 93.0, 91.5, 89.0, …
$ Country <chr> "United States", "United States", "France", "United States…
$ Source <chr> "Amazon", "Tesla, SpaceX", "LVMH", "Microsoft", "Facebook"…
$ Rank <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
$ Age <dbl> 57, 49, 72, 65, 36, 90, 76, 48, 47, 64, 85, 67, 66, 65, 49…
$ Residence <chr> "Seattle, Washington", "Austin, Texas", "Paris, France", "…
$ Citizenship <chr> "United States", "United States", "France", "United States…
$ Status <chr> "In Relationship", "In Relationship", "Married", "Divorced…
$ Children <dbl> 4, 7, 5, 3, 2, 3, 4, 1, 3, 3, 3, 2, NA, 3, NA, 6, NA, 4, 3…
$ Education <chr> "Bachelor of Arts/Science, Princeton University", "Bachelo…
$ Self_made <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FAL…
::skim(billionaire) skimr
Name | billionaire |
Number of rows | 2755 |
Number of columns | 12 |
_______________________ | |
Column type frequency: | |
character | 7 |
logical | 1 |
numeric | 4 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Name | 0 | 1.00 | 5 | 38 | 0 | 2752 | 0 |
Country | 0 | 1.00 | 4 | 20 | 0 | 70 | 0 |
Source | 0 | 1.00 | 1 | 35 | 0 | 924 | 0 |
Residence | 40 | 0.99 | 5 | 50 | 0 | 768 | 0 |
Citizenship | 16 | 0.99 | 4 | 20 | 0 | 70 | 0 |
Status | 665 | 0.76 | 6 | 18 | 0 | 8 | 0 |
Education | 1346 | 0.51 | 11 | 218 | 0 | 1120 | 0 |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
Self_made | 18 | 0.99 | 0.72 | TRU: 1960, FAL: 777 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
NetWorth | 0 | 1.00 | 4.75 | 9.62 | 1 | 1.5 | 2.3 | 4.2 | 177 | ▇▁▁▁▁ |
Rank | 0 | 1.00 | 1345.66 | 772.67 | 1 | 680.0 | 1362.0 | 2035.0 | 2674 | ▇▇▇▆▇ |
Age | 125 | 0.95 | 63.27 | 13.48 | 18 | 54.0 | 63.0 | 73.0 | 99 | ▁▃▇▆▂ |
Children | 1203 | 0.56 | 2.98 | 1.62 | 1 | 2.0 | 3.0 | 4.0 | 23 | ▇▁▁▁▁ |
billionaire
# A tibble: 2,755 × 12
Name NetWorth Country Source Rank Age Residence Citizenship Status
<chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
1 Jeff Bezos 177 United… Amazon 1 57 Seattle,… United Sta… In Re…
2 Elon Musk 151 United… Tesla… 2 49 Austin, … United Sta… In Re…
3 Bernard Arn… 150 France LVMH 3 72 Paris, F… France Marri…
4 Bill Gates 124 United… Micro… 4 65 Medina, … United Sta… Divor…
5 Mark Zucker… 97 United… Faceb… 5 36 Palo Alt… United Sta… Marri…
6 Warren Buff… 96 United… Berks… 6 90 Omaha, N… United Sta… Widow…
7 Larry Ellis… 93 United… softw… 7 76 Lanai, H… United Sta… In Re…
8 Larry Page 91.5 United… Google 8 48 Palo Alt… United Sta… Marri…
9 Sergey Brin 89 United… Google 9 47 Los Alto… United Sta… Marri…
10 Mukesh Amba… 84.5 India diver… 10 64 Mumbai, … India Marri…
# ℹ 2,745 more rows
# ℹ 3 more variables: Children <dbl>, Education <chr>, Self_made <lgl>
Data 4 [Added After initial Feedback]
Introduction and data
Identify the source of the data.
http://ergast.com/mrd/
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
The information was queried originally after each race, and from historical records after the races. The information is readily available and recorded and as a result the original curator had to find find the sources and compile them in a comprehensive data set.
Write a brief description of the observations.
This dataset contains information about the Formula 1 Races since the 1950. It contains race locations, Teams, Race qualifying result and championship outcomes.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
Have there been any race tracks that have been predictors of year end championship outcome? Was there a sequence of race victories that predicted championship victory?
What is the impact of age on performance?
Is there a correlations between age and track performance?
A description of the research topic along with a concise statement of your hypotheses on this topic.
The topic we are researching is formula 1 and the the performance of drivers over time and identifying predictors of world championship titles. The goal of this experiences for our team is to find key markers that would predict the outcome of the championship early and to analyze young driver performance to predict future world championship titles.
- Identify the types of variables in your research question. Categorical? Quantitative?
Categorical Variables:
Race name: The name of the race (e.g. “Monaco Grand Prix”)
Circuit name: The name of the circuit where the race was held (e.g. “Circuit de Monaco”)
Driver name: The name of the driver (e.g. “Lewis Hamilton”)
Constructor name: The name of the constructor (e.g. “Mercedes”)
Quantitative Variables:
Race number: The number of the race in the season (e.g. “1” for the first race of the season)
Grid position: The starting position of the driver on the grid (e.g. “3” for the third position)
Lap time: The time it took for the driver to complete a lap (e.g. “1:15.456” for 1 minute and 15.456 seconds)
Pit stop time: The time it took for the driver to complete a pit stop (e.g. “23.789” seconds)
Glimpse of data
# add code here
<- read_csv("data/formula1/circuits.csv") circuits
Rows: 77 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): circuitRef, name, location, country, alt, url
dbl (3): circuitId, lat, lng
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
<- read_csv("data/formula1/drivers.csv") drivers
Rows: 857 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): driverRef, number, code, forename, surname, nationality, url
dbl (1): driverId
date (1): dob
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
<- read_csv("data/formula1/pit_stops.csv") pit_stops
Rows: 9708 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): duration
dbl (5): raceId, driverId, stop, lap, milliseconds
time (1): time
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
<- read_csv("data/formula1/results.csv") results
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 25880 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): position, positionText, time, milliseconds, fastestLap, rank, fast...
dbl (10): resultId, raceId, driverId, constructorId, number, grid, positionO...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
<- read_csv("data/formula1/qualifying.csv") qualifying
Rows: 9615 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): q1, q2, q3
dbl (6): qualifyId, raceId, driverId, constructorId, number, position
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
<- read_csv("data/formula1/races.csv") races
Rows: 1102 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (13): name, time, url, fp1_date, fp1_time, fp2_date, fp2_time, fp3_date...
dbl (4): raceId, year, round, circuitId
date (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
circuits
# A tibble: 77 × 9
circuitId circuitRef name location country lat lng alt url
<dbl> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
1 1 albert_park Albert P… Melbour… Austra… -37.8 145. 10 http…
2 2 sepang Sepang I… Kuala L… Malays… 2.76 102. 18 http…
3 3 bahrain Bahrain … Sakhir Bahrain 26.0 50.5 7 http…
4 4 catalunya Circuit … Montmeló Spain 41.6 2.26 109 http…
5 5 istanbul Istanbul… Istanbul Turkey 41.0 29.4 130 http…
6 6 monaco Circuit … Monte-C… Monaco 43.7 7.42 7 http…
7 7 villeneuve Circuit … Montreal Canada 45.5 -73.5 13 http…
8 8 magny_cours Circuit … Magny C… France 46.9 3.16 228 http…
9 9 silverstone Silverst… Silvers… UK 52.1 -1.02 153 http…
10 10 hockenheimring Hockenhe… Hockenh… Germany 49.3 8.57 103 http…
# ℹ 67 more rows
drivers
# A tibble: 857 × 9
driverId driverRef number code forename surname dob nationality url
<dbl> <chr> <chr> <chr> <chr> <chr> <date> <chr> <chr>
1 1 hamilton "44" HAM Lewis Hamilt… 1985-01-07 British http…
2 2 heidfeld "\\N" HEI Nick Heidfe… 1977-05-10 German http…
3 3 rosberg "6" ROS Nico Rosberg 1985-06-27 German http…
4 4 alonso "14" ALO Fernando Alonso 1981-07-29 Spanish http…
5 5 kovalain… "\\N" KOV Heikki Kovala… 1981-10-19 Finnish http…
6 6 nakajima "\\N" NAK Kazuki Nakaji… 1985-01-11 Japanese http…
7 7 bourdais "\\N" BOU Sébasti… Bourda… 1979-02-28 French http…
8 8 raikkonen "7" RAI Kimi Räikkö… 1979-10-17 Finnish http…
9 9 kubica "88" KUB Robert Kubica 1984-12-07 Polish http…
10 10 glock "\\N" GLO Timo Glock 1982-03-18 German http…
# ℹ 847 more rows
pit_stops
# A tibble: 9,708 × 7
raceId driverId stop lap time duration milliseconds
<dbl> <dbl> <dbl> <dbl> <time> <chr> <dbl>
1 841 153 1 1 17:05:23 26.898 26898
2 841 30 1 1 17:05:52 25.021 25021
3 841 17 1 11 17:20:48 23.426 23426
4 841 4 1 12 17:22:34 23.251 23251
5 841 13 1 13 17:24:10 23.842 23842
6 841 22 1 13 17:24:29 23.643 23643
7 841 20 1 14 17:25:17 22.603 22603
8 841 814 1 14 17:26:03 24.863 24863
9 841 816 1 14 17:26:50 25.259 25259
10 841 67 1 15 17:27:34 25.342 25342
# ℹ 9,698 more rows
results
# A tibble: 25,880 × 18
resultId raceId driverId constructorId number grid position positionText
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 1 18 1 1 22 1 "1" 1
2 2 18 2 2 3 5 "2" 2
3 3 18 3 3 7 7 "3" 3
4 4 18 4 4 5 11 "4" 4
5 5 18 5 1 23 3 "5" 5
6 6 18 6 3 8 13 "6" 6
7 7 18 7 5 14 17 "7" 7
8 8 18 8 6 1 15 "8" 8
9 9 18 9 2 4 2 "\\N" R
10 10 18 10 7 12 18 "\\N" R
# ℹ 25,870 more rows
# ℹ 10 more variables: positionOrder <dbl>, points <dbl>, laps <dbl>,
# time <chr>, milliseconds <chr>, fastestLap <chr>, rank <chr>,
# fastestLapTime <chr>, fastestLapSpeed <chr>, statusId <dbl>
qualifying
# A tibble: 9,615 × 9
qualifyId raceId driverId constructorId number position q1 q2 q3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
1 1 18 1 1 22 1 1:26.572 1:25.… "1:2…
2 2 18 9 2 4 2 1:26.103 1:25.… "1:2…
3 3 18 5 1 23 3 1:25.664 1:25.… "1:2…
4 4 18 13 6 2 4 1:25.994 1:25.… "1:2…
5 5 18 2 2 3 5 1:25.960 1:25.… "1:2…
6 6 18 15 7 11 6 1:26.427 1:26.… "1:2…
7 7 18 3 3 7 7 1:26.295 1:26.… "1:2…
8 8 18 14 9 9 8 1:26.381 1:26.… "1:2…
9 9 18 10 7 12 9 1:26.919 1:26.… "1:2…
10 10 18 20 5 15 10 1:26.702 1:25.… "\\N"
# ℹ 9,605 more rows
races
# A tibble: 1,102 × 18
raceId year round circuitId name date time url fp1_date fp1_time
<dbl> <dbl> <dbl> <dbl> <chr> <date> <chr> <chr> <chr> <chr>
1 1 2009 1 1 Austra… 2009-03-29 06:0… http… "\\N" "\\N"
2 2 2009 2 2 Malays… 2009-04-05 09:0… http… "\\N" "\\N"
3 3 2009 3 17 Chines… 2009-04-19 07:0… http… "\\N" "\\N"
4 4 2009 4 3 Bahrai… 2009-04-26 12:0… http… "\\N" "\\N"
5 5 2009 5 4 Spanis… 2009-05-10 12:0… http… "\\N" "\\N"
6 6 2009 6 6 Monaco… 2009-05-24 12:0… http… "\\N" "\\N"
7 7 2009 7 5 Turkis… 2009-06-07 12:0… http… "\\N" "\\N"
8 8 2009 8 9 Britis… 2009-06-21 12:0… http… "\\N" "\\N"
9 9 2009 9 20 German… 2009-07-12 12:0… http… "\\N" "\\N"
10 10 2009 10 11 Hungar… 2009-07-26 12:0… http… "\\N" "\\N"
# ℹ 1,092 more rows
# ℹ 8 more variables: fp2_date <chr>, fp2_time <chr>, fp3_date <chr>,
# fp3_time <chr>, quali_date <chr>, quali_time <chr>, sprint_date <chr>,
# sprint_time <chr>