Project title

Proposal

library(tidyverse)
library(skimr)

Data 1

Introduction and data

Identify the source of the data.

https://gist.github.com/armgilles/194bcff35001e7eb53a2a8b441e8b2c6
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

The original data curator most likely collected this data by organizing every known Pokemon up to the 6th generation pokedex.
Write a brief description of the observations.

This csv file includes every Pokemon including mega evolution up to the 6th generation (X & Y). It includes the Pokemon’s names, stats, generation, type, and whether they are legendary or not.

Research question

A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

How does the generation of Pokemon affect their stats and whether they are legendary or not
A description of the research topic along with a concise statement of your hypotheses on this topic.

I think the later on the generations, there will be a higher amount of legendary Pokemon which as a result will cause higher overall stats since Lengedaries typically have higher stats. I think this because the later on generations typically have more legendary Pokemons and third evolution Pokemon which generally have higher stats than only two evolution or no evolution Pokemons.
Identify the types of variables in your research question. Categorical? Quantitative?

Name - Categorical

Legendary or Not - Categorical

Generation - Categorical

Type - Categorical

Stats of Pokemon - Quantitative

Glimpse of data

# add code here
pokemon <- read_csv("data/pokemon.csv")

Rows: 800 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Name, Type 1, Type 2
dbl (9): #, Total, HP, Attack, Defense, Sp. Atk, Sp. Def, Speed, Generation
lgl (1): Legendary

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

glimpse(pokemon)

Rows: 800
Columns: 13
$ `#`        <dbl> 1, 2, 3, 3, 4, 5, 6, 6, 6, 7, 8, 9, 9, 10, 11, 12, 13, 14, …
$ Name       <chr> "Bulbasaur", "Ivysaur", "Venusaur", "VenusaurMega Venusaur"…
$ `Type 1`   <chr> "Grass", "Grass", "Grass", "Grass", "Fire", "Fire", "Fire",…
$ `Type 2`   <chr> "Poison", "Poison", "Poison", "Poison", NA, NA, "Flying", "…
$ Total      <dbl> 318, 405, 525, 625, 309, 405, 534, 634, 634, 314, 405, 530,…
$ HP         <dbl> 45, 60, 80, 80, 39, 58, 78, 78, 78, 44, 59, 79, 79, 45, 50,…
$ Attack     <dbl> 49, 62, 82, 100, 52, 64, 84, 130, 104, 48, 63, 83, 103, 30,…
$ Defense    <dbl> 49, 63, 83, 123, 43, 58, 78, 111, 78, 65, 80, 100, 120, 35,…
$ `Sp. Atk`  <dbl> 65, 80, 100, 122, 60, 80, 109, 130, 159, 50, 65, 85, 135, 2…
$ `Sp. Def`  <dbl> 65, 80, 100, 120, 50, 65, 85, 85, 115, 64, 80, 105, 115, 20…
$ Speed      <dbl> 45, 60, 80, 80, 65, 80, 100, 100, 100, 43, 58, 78, 78, 45, …
$ Generation <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ Legendary  <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…

skimr::skim(pokemon)

Data summary
Name	pokemon
Number of rows	800
Number of columns	13
_______________________
Column type frequency:
character	3
logical	1
numeric	9
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
Name	0	1.00	3	25	800
Type 1	0	1.00	3	8	18
Type 2	386	0.52	3	8	18

Variable type: logical

skim_variable	n_missing	complete_rate	mean	count
Legendary	0	1	0.08	FAL: 735, TRU: 65

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
#	1	362.81	208.34	1	184.75	364.5	539.25	721	▇▇▇▇▇
Total	1	435.10	119.96	180	330.00	450.0	515.00	780	▃▆▇▂▁
HP	1	69.26	25.53	1	50.00	65.0	80.00	255	▃▇▁▁▁
Attack	1	79.00	32.46	5	55.00	75.0	100.00	190	▂▇▆▂▁
Defense	1	73.84	31.18	5	50.00	70.0	90.00	230	▃▇▂▁▁
Sp. Atk	1	72.82	32.72	10	49.75	65.0	95.00	194	▅▇▅▂▁
Sp. Def	1	71.90	27.83	20	50.00	70.0	90.00	230	▇▇▂▁▁
Speed	1	68.28	29.06	5	45.00	65.0	90.00	180	▃▇▆▁▁
Generation	1	3.32	1.66	1	2.00	3.0	5.00	6	▇▅▃▅▂

pokemon

# A tibble: 800 × 13
     `#` Name   `Type 1` `Type 2` Total    HP Attack Defense `Sp. Atk` `Sp. Def`
   <dbl> <chr>  <chr>    <chr>    <dbl> <dbl>  <dbl>   <dbl>     <dbl>     <dbl>
 1     1 Bulba… Grass    Poison     318    45     49      49        65        65
 2     2 Ivysa… Grass    Poison     405    60     62      63        80        80
 3     3 Venus… Grass    Poison     525    80     82      83       100       100
 4     3 Venus… Grass    Poison     625    80    100     123       122       120
 5     4 Charm… Fire     <NA>       309    39     52      43        60        50
 6     5 Charm… Fire     <NA>       405    58     64      58        80        65
 7     6 Chari… Fire     Flying     534    78     84      78       109        85
 8     6 Chari… Fire     Dragon     634    78    130     111       130        85
 9     6 Chari… Fire     Flying     634    78    104      78       159       115
10     7 Squir… Water    <NA>       314    44     48      65        50        64
# ℹ 790 more rows
# ℹ 3 more variables: Speed <dbl>, Generation <dbl>, Legendary <lgl>

Data 2

Introduction and data

Identify the source of the data.

https://www.kaggle.com/datasets/ibriiee/video-games-sales-dataset-2022-updated-extra-feat?resource=download
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

The data was likely found by scraping data from gaming website which showcases the number of sales they have per game.
Write a brief description of the observations.

The data contains the names of various video games, their platform, genre, year-of-release, and the number of sales per region.

Research question

A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
- How does the popularity and sales of video games vary across different genres, platforms, and regions? Is there a preference in certain regions for a certain genre or platform?
A description of the research topic along with a concise statement of your hypotheses on this topic.

I believe that it’s likely that in North America and Europe, people have a preference for action games and personal computers, but in Japan consoles and role-playing games may be more popular.
Identify the types of variables in your research question. Categorical? Quantitative?

Name - Categorical

Platform - Categorical

Genre - Categorical

NA_Sales - Quantitative

EU_Sales - Quantitative

JP_Sales - Quantitative

Glimpse of data

video_games <- read_csv("data/video_games.csv")

Rows: 16719 Columns: 16
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): Name, Platform, Year_of_Release, Genre, Publisher, User_Score, Deve...
dbl (8): NA_Sales, EU_Sales, JP_Sales, Other_Sales, Global_Sales, Critic_Sco...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

glimpse(video_games)

Rows: 16,719
Columns: 16
$ Name            <chr> "Wii Sports", "Super Mario Bros.", "Mario Kart Wii", "…
$ Platform        <chr> "Wii", "NES", "Wii", "Wii", "GB", "GB", "DS", "Wii", "…
$ Year_of_Release <chr> "2006", "1985", "2008", "2009", "1996", "1989", "2006"…
$ Genre           <chr> "Sports", "Platform", "Racing", "Sports", "Role-Playin…
$ Publisher       <chr> "Nintendo", "Nintendo", "Nintendo", "Nintendo", "Ninte…
$ NA_Sales        <dbl> 41.36, 29.08, 15.68, 15.61, 11.27, 23.20, 11.28, 13.96…
$ EU_Sales        <dbl> 28.96, 3.58, 12.76, 10.93, 8.89, 2.26, 9.14, 9.18, 6.9…
$ JP_Sales        <dbl> 3.77, 6.81, 3.79, 3.28, 10.22, 4.22, 6.50, 2.93, 4.70,…
$ Other_Sales     <dbl> 8.45, 0.77, 3.29, 2.95, 1.00, 0.58, 2.88, 2.84, 2.24, …
$ Global_Sales    <dbl> 82.53, 40.24, 35.52, 32.77, 31.37, 30.26, 29.80, 28.92…
$ Critic_Score    <dbl> 76, NA, 82, 80, NA, NA, 89, 58, 87, NA, NA, 91, NA, 80…
$ Critic_Count    <dbl> 51, NA, 73, 73, NA, NA, 65, 41, 80, NA, NA, 64, NA, 63…
$ User_Score      <chr> "8", NA, "8.3", "8", NA, NA, "8.5", "6.6", "8.4", NA, …
$ User_Count      <dbl> 322, NA, 709, 192, NA, NA, 431, 129, 594, NA, NA, 464,…
$ Developer       <chr> "Nintendo", NA, "Nintendo", "Nintendo", NA, NA, "Ninte…
$ Rating          <chr> "E", NA, "E", "E", NA, NA, "E", "E", "E", NA, NA, "E",…

skimr::skim(video_games)

Data summary
Name	video_games
Number of rows	16719
Number of columns	16
_______________________
Column type frequency:
character	8
numeric	8
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
Name	2	1.0	1	132	11562
Platform	0	1.0	2	4	31
Year_of_Release	0	1.0	3	4	40
Genre	2	1.0	4	12	12
Publisher	0	1.0	3	38	581
User_Score	6704	0.6	1	3	96
Developer	6623	0.6	2	80	1696
Rating	6769	0.6	1	4	8

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
NA_Sales	0	1.00	0.26	0.81	0.00	0.00	0.08	0.24	41.36	▇▁▁▁▁
EU_Sales	0	1.00	0.15	0.50	0.00	0.00	0.02	0.11	28.96	▇▁▁▁▁
JP_Sales	0	1.00	0.08	0.31	0.00	0.00	0.00	0.04	10.22	▇▁▁▁▁
Other_Sales	0	1.00	0.05	0.19	0.00	0.00	0.01	0.03	10.57	▇▁▁▁▁
Global_Sales	0	1.00	0.53	1.55	0.01	0.06	0.17	0.47	82.53	▇▁▁▁▁
Critic_Score	8582	0.49	68.97	13.94	13.00	60.00	71.00	79.00	98.00	▁▁▅▇▃
Critic_Count	8582	0.49	26.36	18.98	3.00	12.00	21.00	36.00	113.00	▇▃▂▁▁
User_Count	9129	0.45	162.23	561.28	4.00	10.00	24.00	81.00	10665.00	▇▁▁▁▁

video_games

# A tibble: 16,719 × 16
   Name      Platform Year_of_Release Genre Publisher NA_Sales EU_Sales JP_Sales
   <chr>     <chr>    <chr>           <chr> <chr>        <dbl>    <dbl>    <dbl>
 1 Wii Spor… Wii      2006            Spor… Nintendo      41.4    29.0      3.77
 2 Super Ma… NES      1985            Plat… Nintendo      29.1     3.58     6.81
 3 Mario Ka… Wii      2008            Raci… Nintendo      15.7    12.8      3.79
 4 Wii Spor… Wii      2009            Spor… Nintendo      15.6    10.9      3.28
 5 Pokemon … GB       1996            Role… Nintendo      11.3     8.89    10.2 
 6 Tetris    GB       1989            Puzz… Nintendo      23.2     2.26     4.22
 7 New Supe… DS       2006            Plat… Nintendo      11.3     9.14     6.5 
 8 Wii Play  Wii      2006            Misc  Nintendo      14.0     9.18     2.93
 9 New Supe… Wii      2009            Plat… Nintendo      14.4     6.94     4.7 
10 Duck Hunt NES      1984            Shoo… Nintendo      26.9     0.63     0.28
# ℹ 16,709 more rows
# ℹ 8 more variables: Other_Sales <dbl>, Global_Sales <dbl>,
#   Critic_Score <dbl>, Critic_Count <dbl>, User_Score <chr>, User_Count <dbl>,
#   Developer <chr>, Rating <chr>

Data 3

Introduction and data

Identify the source of the data.

https://think.cs.vt.edu/corgis/csv/billionaires/

From the CORGIS Dataset Project

State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

The data builds off data originally collect by Forbes on the Forbes World’s Billionaires lists from 1996-2014. Scholars at the Peterson Institute for International Economics added additional variables for each billionaire that revealed important information.

Write a brief description of the observations.

The data contains information about different billionaires. There are variables that contain information about the personal information of billionaires such as their names, their ages, their gender and location of citizenship. There are also questions that pertain more to their professional side such as the companies each of them own or work for, their position in the company, the year the company was founded, and the industry their company operates in. Finally, there are variables pertaining to their wealth such as total wealth, wealth type and wealth category.

Research question

A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
- How does the age of billionaires relate to their wealth and the industry that they are in?
- What are the differences in total wealth between male and female billionaires across different industries and demographic age groups?

A description of the research topic along with a concise statement of your hypotheses on this topic.

For the first proposed research question, we want to examine whether the age of a billionaire can provide insights into what industries they work in as well as their total wealth. We think age does have a correlation with certain types of industry as well as total wealth. We hypothesize this since the ways of making a lot of money has changed throughout history, for example making money through the technology sector may not have been as prevalent 100 years compared to today. Additionally, we hypothesize that older billionaires would most likely have more total wealth since they have had more time to make money.

For the second proposed research question, the focus is how the genders of billionaires relate to their total wealth, the sectors they work in as well as their age groups. We hypothesize that female billionaires will most likely have less total wealth than male ones. Additionally, we think that there would be much fewer female billionaires in male dominated industries such as money management and real estate.

Identify the types of variables in your research question. Categorical? Quantitative?

Name - Categorical

Rank - Categorical

Company Founded - Quantitative

Company Sector - Categorical

Demographic Age - Quantitative

Gender - Categorical

Location.Citizenship - Categorical

Wealth.Type - Categorical

Wealth.worth in Billions - Quantitative

Wealth.how.industry - Categorical

Glimpse of data

# add code here
billionaire <- read_csv("data/forbes_billionaires.csv")

Rows: 2755 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): Name, Country, Source, Residence, Citizenship, Status, Education
dbl (4): NetWorth, Rank, Age, Children
lgl (1): Self_made

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

glimpse(billionaire)

Rows: 2,755
Columns: 12
$ Name        <chr> "Jeff Bezos", "Elon Musk", "Bernard Arnault & family", "Bi…
$ NetWorth    <dbl> 177.0, 151.0, 150.0, 124.0, 97.0, 96.0, 93.0, 91.5, 89.0, …
$ Country     <chr> "United States", "United States", "France", "United States…
$ Source      <chr> "Amazon", "Tesla, SpaceX", "LVMH", "Microsoft", "Facebook"…
$ Rank        <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
$ Age         <dbl> 57, 49, 72, 65, 36, 90, 76, 48, 47, 64, 85, 67, 66, 65, 49…
$ Residence   <chr> "Seattle, Washington", "Austin, Texas", "Paris, France", "…
$ Citizenship <chr> "United States", "United States", "France", "United States…
$ Status      <chr> "In Relationship", "In Relationship", "Married", "Divorced…
$ Children    <dbl> 4, 7, 5, 3, 2, 3, 4, 1, 3, 3, 3, 2, NA, 3, NA, 6, NA, 4, 3…
$ Education   <chr> "Bachelor of Arts/Science, Princeton University", "Bachelo…
$ Self_made   <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FAL…

skimr::skim(billionaire)

Data summary
Name	billionaire
Number of rows	2755
Number of columns	12
_______________________
Column type frequency:
character	7
logical	1
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
Name	0	1.00	5	38	2752
Country	0	1.00	4	20	70
Source	0	1.00	1	35	924
Residence	40	0.99	5	50	768
Citizenship	16	0.99	4	20	70
Status	665	0.76	6	18	8
Education	1346	0.51	11	218	1120

Variable type: logical

skim_variable	n_missing	complete_rate	mean	count
Self_made	18	0.99	0.72	TRU: 1960, FAL: 777

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
NetWorth	0	1.00	4.75	9.62	1	1.5	2.3	4.2	177	▇▁▁▁▁
Rank	0	1.00	1345.66	772.67	1	680.0	1362.0	2035.0	2674	▇▇▇▆▇
Age	125	0.95	63.27	13.48	18	54.0	63.0	73.0	99	▁▃▇▆▂
Children	1203	0.56	2.98	1.62	1	2.0	3.0	4.0	23	▇▁▁▁▁

billionaire

# A tibble: 2,755 × 12
   Name         NetWorth Country Source  Rank   Age Residence Citizenship Status
   <chr>           <dbl> <chr>   <chr>  <dbl> <dbl> <chr>     <chr>       <chr> 
 1 Jeff Bezos      177   United… Amazon     1    57 Seattle,… United Sta… In Re…
 2 Elon Musk       151   United… Tesla…     2    49 Austin, … United Sta… In Re…
 3 Bernard Arn…    150   France  LVMH       3    72 Paris, F… France      Marri…
 4 Bill Gates      124   United… Micro…     4    65 Medina, … United Sta… Divor…
 5 Mark Zucker…     97   United… Faceb…     5    36 Palo Alt… United Sta… Marri…
 6 Warren Buff…     96   United… Berks…     6    90 Omaha, N… United Sta… Widow…
 7 Larry Ellis…     93   United… softw…     7    76 Lanai, H… United Sta… In Re…
 8 Larry Page       91.5 United… Google     8    48 Palo Alt… United Sta… Marri…
 9 Sergey Brin      89   United… Google     9    47 Los Alto… United Sta… Marri…
10 Mukesh Amba…     84.5 India   diver…    10    64 Mumbai, … India       Marri…
# ℹ 2,745 more rows
# ℹ 3 more variables: Children <dbl>, Education <chr>, Self_made <lgl>

Data 4 [Added After initial Feedback]

Introduction and data

Identify the source of the data.

http://ergast.com/mrd/
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

The information was queried originally after each race, and from historical records after the races. The information is readily available and recorded and as a result the original curator had to find find the sources and compile them in a comprehensive data set.
Write a brief description of the observations.

This dataset contains information about the Formula 1 Races since the 1950. It contains race locations, Teams, Race qualifying result and championship outcomes.

Research question

A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

Have there been any race tracks that have been predictors of year end championship outcome? Was there a sequence of race victories that predicted championship victory?

What is the impact of age on performance?

Is there a correlations between age and track performance?
A description of the research topic along with a concise statement of your hypotheses on this topic.

The topic we are researching is formula 1 and the the performance of drivers over time and identifying predictors of world championship titles. The goal of this experiences for our team is to find key markers that would predict the outcome of the championship early and to analyze young driver performance to predict future world championship titles.

Identify the types of variables in your research question. Categorical? Quantitative?

Categorical Variables:

Race name: The name of the race (e.g. “Monaco Grand Prix”)
Circuit name: The name of the circuit where the race was held (e.g. “Circuit de Monaco”)
Driver name: The name of the driver (e.g. “Lewis Hamilton”)
Constructor name: The name of the constructor (e.g. “Mercedes”)

Quantitative Variables:

Race number: The number of the race in the season (e.g. “1” for the first race of the season)
Grid position: The starting position of the driver on the grid (e.g. “3” for the third position)
Lap time: The time it took for the driver to complete a lap (e.g. “1:15.456” for 1 minute and 15.456 seconds)
Pit stop time: The time it took for the driver to complete a pit stop (e.g. “23.789” seconds)

Glimpse of data

# add code here
circuits <- read_csv("data/formula1/circuits.csv")

Rows: 77 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): circuitRef, name, location, country, alt, url
dbl (3): circuitId, lat, lng

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

drivers <- read_csv("data/formula1/drivers.csv")

Rows: 857 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (7): driverRef, number, code, forename, surname, nationality, url
dbl  (1): driverId
date (1): dob

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

pit_stops <- read_csv("data/formula1/pit_stops.csv")

Rows: 9708 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): duration
dbl  (5): raceId, driverId, stop, lap, milliseconds
time (1): time

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

results <- read_csv("data/formula1/results.csv")

Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)

Rows: 25880 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (8): position, positionText, time, milliseconds, fastestLap, rank, fast...
dbl (10): resultId, raceId, driverId, constructorId, number, grid, positionO...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

qualifying <- read_csv("data/formula1/qualifying.csv")

Rows: 9615 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): q1, q2, q3
dbl (6): qualifyId, raceId, driverId, constructorId, number, position

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

races <- read_csv("data/formula1/races.csv")

Rows: 1102 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (13): name, time, url, fp1_date, fp1_time, fp2_date, fp2_time, fp3_date...
dbl   (4): raceId, year, round, circuitId
date  (1): date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

circuits

# A tibble: 77 × 9
   circuitId circuitRef     name      location country    lat    lng alt   url  
       <dbl> <chr>          <chr>     <chr>    <chr>    <dbl>  <dbl> <chr> <chr>
 1         1 albert_park    Albert P… Melbour… Austra… -37.8  145.   10    http…
 2         2 sepang         Sepang I… Kuala L… Malays…   2.76 102.   18    http…
 3         3 bahrain        Bahrain … Sakhir   Bahrain  26.0   50.5  7     http…
 4         4 catalunya      Circuit … Montmeló Spain    41.6    2.26 109   http…
 5         5 istanbul       Istanbul… Istanbul Turkey   41.0   29.4  130   http…
 6         6 monaco         Circuit … Monte-C… Monaco   43.7    7.42 7     http…
 7         7 villeneuve     Circuit … Montreal Canada   45.5  -73.5  13    http…
 8         8 magny_cours    Circuit … Magny C… France   46.9    3.16 228   http…
 9         9 silverstone    Silverst… Silvers… UK       52.1   -1.02 153   http…
10        10 hockenheimring Hockenhe… Hockenh… Germany  49.3    8.57 103   http…
# ℹ 67 more rows

drivers

# A tibble: 857 × 9
   driverId driverRef number code  forename surname dob        nationality url  
      <dbl> <chr>     <chr>  <chr> <chr>    <chr>   <date>     <chr>       <chr>
 1        1 hamilton  "44"   HAM   Lewis    Hamilt… 1985-01-07 British     http…
 2        2 heidfeld  "\\N"  HEI   Nick     Heidfe… 1977-05-10 German      http…
 3        3 rosberg   "6"    ROS   Nico     Rosberg 1985-06-27 German      http…
 4        4 alonso    "14"   ALO   Fernando Alonso  1981-07-29 Spanish     http…
 5        5 kovalain… "\\N"  KOV   Heikki   Kovala… 1981-10-19 Finnish     http…
 6        6 nakajima  "\\N"  NAK   Kazuki   Nakaji… 1985-01-11 Japanese    http…
 7        7 bourdais  "\\N"  BOU   Sébasti… Bourda… 1979-02-28 French      http…
 8        8 raikkonen "7"    RAI   Kimi     Räikkö… 1979-10-17 Finnish     http…
 9        9 kubica    "88"   KUB   Robert   Kubica  1984-12-07 Polish      http…
10       10 glock     "\\N"  GLO   Timo     Glock   1982-03-18 German      http…
# ℹ 847 more rows

pit_stops

# A tibble: 9,708 × 7
   raceId driverId  stop   lap time     duration milliseconds
    <dbl>    <dbl> <dbl> <dbl> <time>   <chr>           <dbl>
 1    841      153     1     1 17:05:23 26.898          26898
 2    841       30     1     1 17:05:52 25.021          25021
 3    841       17     1    11 17:20:48 23.426          23426
 4    841        4     1    12 17:22:34 23.251          23251
 5    841       13     1    13 17:24:10 23.842          23842
 6    841       22     1    13 17:24:29 23.643          23643
 7    841       20     1    14 17:25:17 22.603          22603
 8    841      814     1    14 17:26:03 24.863          24863
 9    841      816     1    14 17:26:50 25.259          25259
10    841       67     1    15 17:27:34 25.342          25342
# ℹ 9,698 more rows

results

# A tibble: 25,880 × 18
   resultId raceId driverId constructorId number  grid position positionText
      <dbl>  <dbl>    <dbl>         <dbl>  <dbl> <dbl> <chr>    <chr>       
 1        1     18        1             1     22     1 "1"      1           
 2        2     18        2             2      3     5 "2"      2           
 3        3     18        3             3      7     7 "3"      3           
 4        4     18        4             4      5    11 "4"      4           
 5        5     18        5             1     23     3 "5"      5           
 6        6     18        6             3      8    13 "6"      6           
 7        7     18        7             5     14    17 "7"      7           
 8        8     18        8             6      1    15 "8"      8           
 9        9     18        9             2      4     2 "\\N"    R           
10       10     18       10             7     12    18 "\\N"    R           
# ℹ 25,870 more rows
# ℹ 10 more variables: positionOrder <dbl>, points <dbl>, laps <dbl>,
#   time <chr>, milliseconds <chr>, fastestLap <chr>, rank <chr>,
#   fastestLapTime <chr>, fastestLapSpeed <chr>, statusId <dbl>

qualifying

# A tibble: 9,615 × 9
   qualifyId raceId driverId constructorId number position q1       q2     q3   
       <dbl>  <dbl>    <dbl>         <dbl>  <dbl>    <dbl> <chr>    <chr>  <chr>
 1         1     18        1             1     22        1 1:26.572 1:25.… "1:2…
 2         2     18        9             2      4        2 1:26.103 1:25.… "1:2…
 3         3     18        5             1     23        3 1:25.664 1:25.… "1:2…
 4         4     18       13             6      2        4 1:25.994 1:25.… "1:2…
 5         5     18        2             2      3        5 1:25.960 1:25.… "1:2…
 6         6     18       15             7     11        6 1:26.427 1:26.… "1:2…
 7         7     18        3             3      7        7 1:26.295 1:26.… "1:2…
 8         8     18       14             9      9        8 1:26.381 1:26.… "1:2…
 9         9     18       10             7     12        9 1:26.919 1:26.… "1:2…
10        10     18       20             5     15       10 1:26.702 1:25.… "\\N"
# ℹ 9,605 more rows

races

# A tibble: 1,102 × 18
   raceId  year round circuitId name    date       time  url   fp1_date fp1_time
    <dbl> <dbl> <dbl>     <dbl> <chr>   <date>     <chr> <chr> <chr>    <chr>   
 1      1  2009     1         1 Austra… 2009-03-29 06:0… http… "\\N"    "\\N"   
 2      2  2009     2         2 Malays… 2009-04-05 09:0… http… "\\N"    "\\N"   
 3      3  2009     3        17 Chines… 2009-04-19 07:0… http… "\\N"    "\\N"   
 4      4  2009     4         3 Bahrai… 2009-04-26 12:0… http… "\\N"    "\\N"   
 5      5  2009     5         4 Spanis… 2009-05-10 12:0… http… "\\N"    "\\N"   
 6      6  2009     6         6 Monaco… 2009-05-24 12:0… http… "\\N"    "\\N"   
 7      7  2009     7         5 Turkis… 2009-06-07 12:0… http… "\\N"    "\\N"   
 8      8  2009     8         9 Britis… 2009-06-21 12:0… http… "\\N"    "\\N"   
 9      9  2009     9        20 German… 2009-07-12 12:0… http… "\\N"    "\\N"   
10     10  2009    10        11 Hungar… 2009-07-26 12:0… http… "\\N"    "\\N"   
# ℹ 1,092 more rows
# ℹ 8 more variables: fp2_date <chr>, fp2_time <chr>, fp3_date <chr>,
#   fp3_time <chr>, quali_date <chr>, quali_time <chr>, sprint_date <chr>,
#   sprint_time <chr>