Wondrous Raichu

Project Proposal

library(tidyverse)
library(skimr)
library(jsonlite)
library(dplyr)
library(tibble)

Data 1

Introduction and data

Identify the source of the data.

The dataset comes from Inside Airbnb (http://insideairbnb.com/get-the-data/), an open platform that provides data on Airbnb listings in different locations around the world.
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

The data gives NYC listings from the last quarter in 2022. It is aggregated through Airbnb’s public information on their website and shows all listings from that respective quarter at the particular time of publication (i.e., for this particular dataset, December 4th and 5th, 2022).
Write a brief description of the observations.

The observations published in this dataset contain all the pieces of information posted on a typical Airbnb listing. It describes the host and who they are, ratings on the host and place of stay, etc.

Research question(s)

How does Airbnb listing prices depend on rating of listing, location of listing, amenities included in the listing, and host response time?

In this research question, we aim to understand what factors affect Airbnb listing prices. There are five variables: four independent variables (i.e., rating of listing, location of listing, amenities included in the listing, and host response time) and one dependent variable (i.e. listing price). Listing price, rating of listing, and host response time are quantitative variables, while location of listing and amenities included in the listing are categorical variables.

This research question is important because it provides us with a better understanding of the Airbnb price mechanism, which can shed light on the broader dynamics of the sharing economy and the way in which online platforms like Airbnb are changing the traditional hospitality industry.

Our hypothesis is that a high listing rating, a more comprehensive list of included amenities, a quick host response time, and a listing located in a safe and convenient location will all lead to higher listing prices.

How do characteristics (i.e., ratings and descriptions) of listings and of hosts affect the prices of listings?

In this research question, we aim to understand how the characteristics of listings and hots affect the prices of listings on Airbnb. There are five variables: four independent variables (i.e., rating of host, description of host, rating of listing, description of listing) and one dependent variable (i.e. listing price). Ratings and listing prices are quantitative variables, while descriptions are categorical variables.

This research question is important because it highlights the strategies hosts take to make a good impression on potential guests and remain competitive in the crowded Airbnb marketplace. Moreover, for potential guests, understanding the strategies hosts use can help them make more informed decisions when selecting a listing that meets their needs and budget.

Our hypothesis is that higher listing ratings and higher host ratings will lead to higher listing prices. Furthermore, listings and hosts with more detailed and attractive descriptions will lead to higher listing prices.

What are the most popular neighborhoods for Airbnb listings and how does this popularity vary by listing type and price?

In this research question, we aim to understand how the popularity of Airbnb listings depend on listing type and listing price. Here, we can associated popularity with high ratings and high number of reviews. There are three variables: two independent variables (i.e., listing type and listing price) and two dependent variables (i.e. ratings and number of reviews). Listing type is a categorical variable, while listing price, ratings, and number of reviews are quantitative variables.

This research question is important as it can provide valuable insights into the preferences of travelers and market trends for Airbnb hosts and guests. It helps potential hosts to identify profitable areas for their listings and to understand the preferences of Airbnb customers. Identifying the most popular neighborhoods for Airbnb listings and analyzing how this popularity varies by listing type and price can help hosts adjust their pricing strategies and better target their listings, while guests can make more informed decisions on where to stay. Additionally, this research can shed light on the factors that influence the popularity of neighborhoods for Airbnb listings, providing valuable information for city planners, policymakers, and tourism agencies in shaping their urban development strategies and promoting tourism in specific areas.

Our hypothesis is that the most popular neighborhood will be Manhattan because it is the center of business and entertainment. Moreover, we hypothesize that entire home/apt listings will be more popular compared to private or shared rooms and that lower-priced listings will be more popular than higher-priced ones.

Glimpse of data

airbnb_data <- read_csv("data/airbnb_data/airbnb_data.csv")

# Preview some rows
head(airbnb_data)

# A tibble: 6 × 75
     id listing_url              scrape_id last_scraped source name  description
  <dbl> <chr>                        <dbl> <date>       <chr>  <chr> <chr>      
1  2595 https://www.airbnb.com/…   2.02e13 2022-12-05   city … Skyl… "Beautiful…
2  5203 https://www.airbnb.com/…   2.02e13 2022-12-05   previ… Cozy… "Our best …
3  5136 https://www.airbnb.com/…   2.02e13 2022-12-04   city … Spac… "We welcom…
4  5121 https://www.airbnb.com/…   2.02e13 2022-12-05   city … Blis… "One room …
5  6848 https://www.airbnb.com/…   2.02e13 2022-12-05   city … Only… "Comfortab…
6  5178 https://www.airbnb.com/…   2.02e13 2022-12-05   city … Larg… "Please do…
# ℹ 68 more variables: neighborhood_overview <chr>, picture_url <chr>,
#   host_id <dbl>, host_url <chr>, host_name <chr>, host_since <date>,
#   host_location <chr>, host_about <chr>, host_response_time <chr>,
#   host_response_rate <chr>, host_acceptance_rate <chr>,
#   host_is_superhost <lgl>, host_thumbnail_url <chr>, host_picture_url <chr>,
#   host_neighbourhood <chr>, host_listings_count <dbl>,
#   host_total_listings_count <dbl>, host_verifications <chr>, …

# Skim through data
skim(airbnb_data)

Data summary
Name	airbnb_data
Number of rows	41533
Number of columns	75
_______________________
Column type frequency:
character	25
Date	5
logical	8
numeric	37
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique	whitespace
listing_url	0	1.00	33	47	41533	0
source	0	1.00	11	15	2	0
name	11	1.00	1	248	40242	0
description	785	0.98	1	1000	37137	0
neighborhood_overview	17443	0.58	1	1000	19392	0
picture_url	0	1.00	60	126	40390	0
host_url	0	1.00	38	43	26832	0
host_name	5	1.00	1	35	9628	0
host_location	7745	0.81	5	40	1104	0
host_about	18316	0.56	1	7309	14151	26
host_response_time	5	1.00	3	18	5	0
host_response_rate	5	1.00	2	4	68	0
host_acceptance_rate	5	1.00	2	4	99	0
host_thumbnail_url	5	1.00	55	106	26362	0
host_picture_url	5	1.00	57	109	26362	0
host_neighbourhood	8189	0.80	3	50	539	0
host_verifications	0	1.00	2	32	8	0
neighbourhood	17443	0.58	13	55	191	0
neighbourhood_cleansed	0	1.00	4	25	223	0
neighbourhood_group_cleansed	0	1.00	5	13	5	0
property_type	0	1.00	4	34	80	0
room_type	0	1.00	10	15	4	0
bathrooms_text	77	1.00	6	17	30	0
amenities	0	1.00	2	2028	35522	0
price	0	1.00	5	10	1287	0

Variable type: Date

skim_variable	n_missing	complete_rate	min	max	median	n_unique
last_scraped	0	1.00	2022-12-04	2022-12-05	2022-12-05	2
host_since	5	1.00	2008-08-22	2022-12-02	2016-04-04	4649
calendar_last_scraped	0	1.00	2022-12-04	2022-12-05	2022-12-05	2
first_review	9393	0.77	2009-04-23	2022-12-04	2019-12-15	3772
last_review	9393	0.77	2011-05-12	2022-12-04	2022-09-15	2715

Variable type: logical

skim_variable	n_missing	complete_rate	mean	count
host_is_superhost	29	1	0.21	FAL: 32635, TRU: 8869
host_has_profile_pic	5	1	0.98	TRU: 40904, FAL: 624
host_identity_verified	5	1	0.85	TRU: 35262, FAL: 6266
bathrooms	41533	0	NaN	:
calendar_updated	41533	0	NaN	:
has_availability	0	1	0.85	TRU: 35281, FAL: 6252
license	41533	0	NaN	:
instant_bookable	0	1	0.20	FAL: 33123, TRU: 8410

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
id	0	1.00	1.728318e+17	2.974371e+17	2.59500e+03	1.835861e+07	4.117861e+07	5.477978e+17	7.741268e+17	▇▁▁▁▂
scrape_id	0	1.00	2.022120e+13	0.000000e+00	2.02212e+13	2.022120e+13	2.022120e+13	2.022120e+13	2.022120e+13	▁▁▇▁▁
host_id	0	1.00	1.400636e+08	1.526932e+08	2.43800e+03	1.491162e+07	6.561181e+07	2.418897e+08	4.899967e+08	▇▂▂▁▂
host_listings_count	5	1.00	8.662000e+01	5.183500e+02	1.00000e+00	1.000000e+00	2.000000e+00	5.000000e+00	4.559000e+03	▇▁▁▁▁
host_total_listings_count	5	1.00	1.362600e+02	7.735100e+02	1.00000e+00	1.000000e+00	3.000000e+00	7.000000e+00	1.201700e+04	▇▁▁▁▁
latitude	0	1.00	4.073000e+01	6.000000e-02	4.05000e+01	4.069000e+01	4.072000e+01	4.076000e+01	4.091000e+01	▁▂▇▅▁
longitude	0	1.00	-7.394000e+01	6.000000e-02	-7.42500e+01	-7.398000e+01	-7.395000e+01	-7.392000e+01	-7.371000e+01	▁▁▇▂▁
accommodates	0	1.00	2.960000e+00	2.080000e+00	0.00000e+00	2.000000e+00	2.000000e+00	4.000000e+00	1.600000e+01	▇▃▁▁▁
bedrooms	3822	0.91	1.380000e+00	7.600000e-01	1.00000e+00	1.000000e+00	1.000000e+00	2.000000e+00	1.400000e+01	▇▁▁▁▁
beds	941	0.98	1.650000e+00	1.160000e+00	1.00000e+00	1.000000e+00	1.000000e+00	2.000000e+00	4.200000e+01	▇▁▁▁▁
minimum_nights	0	1.00	1.859000e+01	3.070000e+01	1.00000e+00	2.000000e+00	1.000000e+01	3.000000e+01	1.250000e+03	▇▁▁▁▁
maximum_nights	0	1.00	5.324173e+04	1.053830e+07	1.00000e+00	6.000000e+01	3.650000e+02	1.125000e+03	2.147484e+09	▇▁▁▁▁
minimum_minimum_nights	14	1.00	1.864000e+01	3.239000e+01	1.00000e+00	2.000000e+00	7.000000e+00	3.000000e+01	1.250000e+03	▇▁▁▁▁
maximum_minimum_nights	14	1.00	2.297000e+01	4.853000e+01	1.00000e+00	2.000000e+00	1.400000e+01	3.000000e+01	1.250000e+03	▇▁▁▁▁
minimum_maximum_nights	14	1.00	1.243053e+06	5.161702e+07	1.00000e+00	2.700000e+02	1.125000e+03	1.125000e+03	2.147484e+09	▇▁▁▁▁
maximum_maximum_nights	14	1.00	2.122356e+06	6.745119e+07	1.00000e+00	3.650000e+02	1.125000e+03	1.125000e+03	2.147484e+09	▇▁▁▁▁
minimum_nights_avg_ntm	14	1.00	2.251000e+01	4.738000e+01	1.00000e+00	2.000000e+00	1.000000e+01	3.000000e+01	1.250000e+03	▇▁▁▁▁
maximum_nights_avg_ntm	14	1.00	1.398113e+06	5.294218e+07	1.00000e+00	3.650000e+02	1.125000e+03	1.125000e+03	2.147484e+09	▇▁▁▁▁
availability_30	0	1.00	7.900000e+00	1.014000e+01	0.00000e+00	0.000000e+00	2.000000e+00	1.500000e+01	3.000000e+01	▇▂▁▁▂
availability_60	0	1.00	2.192000e+01	2.210000e+01	0.00000e+00	0.000000e+00	1.800000e+01	4.200000e+01	6.000000e+01	▇▁▂▂▃
availability_90	0	1.00	3.681000e+01	3.487000e+01	0.00000e+00	0.000000e+00	3.300000e+01	7.000000e+01	9.000000e+01	▇▁▁▃▅
availability_365	0	1.00	1.432900e+02	1.442800e+02	0.00000e+00	0.000000e+00	8.700000e+01	3.120000e+02	3.650000e+02	▇▂▂▁▅
number_of_reviews	0	1.00	2.620000e+01	5.618000e+01	0.00000e+00	1.000000e+00	5.000000e+00	2.500000e+01	1.666000e+03	▇▁▁▁▁
number_of_reviews_ltm	0	1.00	7.980000e+00	1.856000e+01	0.00000e+00	0.000000e+00	1.000000e+00	8.000000e+00	9.920000e+02	▇▁▁▁▁
number_of_reviews_l30d	0	1.00	6.700000e-01	1.550000e+00	0.00000e+00	0.000000e+00	0.000000e+00	1.000000e+00	7.400000e+01	▇▁▁▁▁
review_scores_rating	9393	0.77	4.630000e+00	7.200000e-01	0.00000e+00	4.600000e+00	4.830000e+00	5.000000e+00	5.000000e+00	▁▁▁▁▇
review_scores_accuracy	9841	0.76	4.750000e+00	4.600000e-01	0.00000e+00	4.710000e+00	4.890000e+00	5.000000e+00	5.000000e+00	▁▁▁▁▇
review_scores_cleanliness	9831	0.76	4.630000e+00	5.400000e-01	0.00000e+00	4.500000e+00	4.800000e+00	5.000000e+00	5.000000e+00	▁▁▁▁▇
review_scores_checkin	9845	0.76	4.820000e+00	4.100000e-01	0.00000e+00	4.800000e+00	4.950000e+00	5.000000e+00	5.000000e+00	▁▁▁▁▇
review_scores_communication	9836	0.76	4.810000e+00	4.300000e-01	0.00000e+00	4.800000e+00	4.960000e+00	5.000000e+00	5.000000e+00	▁▁▁▁▇
review_scores_location	9848	0.76	4.740000e+00	4.100000e-01	0.00000e+00	4.640000e+00	4.860000e+00	5.000000e+00	5.000000e+00	▁▁▁▁▇
review_scores_value	9848	0.76	4.650000e+00	4.900000e-01	0.00000e+00	4.550000e+00	4.770000e+00	4.950000e+00	5.000000e+00	▁▁▁▁▇
calculated_host_listings_count	0	1.00	2.063000e+01	6.887000e+01	1.00000e+00	1.000000e+00	1.000000e+00	4.000000e+00	4.870000e+02	▇▁▁▁▁
calculated_host_listings_count_entire_homes	0	1.00	1.131000e+01	5.645000e+01	0.00000e+00	0.000000e+00	1.000000e+00	1.000000e+00	4.870000e+02	▇▁▁▁▁
calculated_host_listings_count_private_rooms	0	1.00	9.200000e+00	4.009000e+01	0.00000e+00	0.000000e+00	0.000000e+00	2.000000e+00	3.450000e+02	▇▁▁▁▁
calculated_host_listings_count_shared_rooms	0	1.00	5.000000e-02	5.900000e-01	0.00000e+00	0.000000e+00	0.000000e+00	0.000000e+00	1.500000e+01	▇▁▁▁▁
reviews_per_month	9393	0.77	1.280000e+00	1.940000e+00	1.00000e-02	1.400000e-01	5.800000e-01	1.880000e+00	1.029800e+02	▇▁▁▁▁

Data 2

Introduction and data

Identify the source of the data.

The dataset comes from Yelp Dataset (https://www.yelp.com/dataset), which provides a subset of the businesses, reviews, and user data on Yelp.
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

The Yelp dataset was originally collected by Yelp’s data team and is publicly available for research and educational purposes. The dataset includes information on businesses located in 10 metropolitan areas in four countries: the United States, Canada, the United Kingdom, and Germany.
Write a brief description of the observations.

The dataset is divided into several JSON files:
- business.json contains business data including location data, attributes, and categories
- review.json contains fill review text data
- user.json contains user data and all the metadata associated with users
- checkin.json contains data on checkins on a business
- tip.json contains data on tips written by a user on a business
In this proposal, we will focus on the data outlined in business.json. The business.json file contains location details (address, city, state, etc.), number of stars, as well as number of reviews among other things.

Research question

Are there differences in the way that customers review chain vs. independent restaurants, and do these differences vary depending on the type of cuisine or location?

In this research question, we aim to understand how customers view chain and independent restaurants. Customer perceptions can be analyzed through ratings and reviews. There are five variables: three independent variables (i.e., type of restaurant, type of cuisine, and location) and two dependent variables (i.e., ratings and reviews). Ratings are quantitative variables, while reviews, type of restaurant, type of cuisine, and location are categorical variables.

This research question is important because it can help restaurant owners and managers to better understand customer perceptions and preferences towards chain and independent restaurants. By examining whether differences in customer reviews vary based on cuisine type or location, insights can be gained into the factors that influence customer satisfaction and provide guidance on how to improve customer experiences.

Our hypothesis is that customers may perceive chain and independent restaurants differently, with chain restaurants being perceived as more consistent and reliable in terms of quality and service, while independent restaurants may be seen as more unique and offering more personalized experiences. The differences in customer perceptions may also vary depending on the type of cuisine or location, with certain cuisines or cities having a stronger preference for chain or independent restaurants.

Glimpse of data

# Consulted Professor Soltoff on how to rectangle dataset
yelp_raw <- read_lines(file = "data/yelp_data/yelp_academic_dataset_business.json")

yelp_list <- map(.x = yelp_raw, .f = fromJSON)

yelp_data <- tibble(yelp = yelp_list) |>
  unnest_wider(col = yelp)

# Preview some rows
head(yelp_data)

# A tibble: 6 × 14
  business_id     name  address city  state postal_code latitude longitude stars
  <chr>           <chr> <chr>   <chr> <chr> <chr>          <dbl>     <dbl> <dbl>
1 Pns2l4eNsfO8kk… Abby… 1616 C… Sant… CA    93101           34.4    -120.    5  
2 mpf3x-BjTdTEA3… The … 87 Gra… Afft… MO    63123           38.6     -90.3   3  
3 tUFrWirKiKi_TA… Targ… 5255 E… Tucs… AZ    85711           32.2    -111.    3.5
4 MTSW4McQd7CbVt… St H… 935 Ra… Phil… PA    19107           40.0     -75.2   4  
5 mWMc6_wTdE0EUB… Perk… 101 Wa… Gree… PA    18054           40.3     -75.5   4.5
6 CF33F8-E6oudUQ… Soni… 615 S … Ashl… TN    37015           36.3     -87.1   2  
# ℹ 5 more variables: review_count <int>, is_open <int>, attributes <list>,
#   categories <chr>, hours <list>

# Skim through data
skim(yelp_data)

Data summary
Name	yelp_data
Number of rows	150346
Number of columns	14
_______________________
Column type frequency:
character	7
list	2
numeric	5
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique
business_id	0	1	22	22	0	150346
name	0	1	2	64	0	114117
address	0	1	0	110	5127	122844
city	0	1	3	52	0	1416
state	0	1	2	3	0	27
postal_code	0	1	0	7	73	3362
categories	103	1	4	503	0	83160

Variable type: list

skim_variable	n_missing	complete_rate	n_unique	min_length	max_length
attributes	0	1	87662	0	33
hours	0	1	49823	0	7

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
latitude	1	36.67	5.87	27.56	32.19	38.78	39.95	53.68	▅▂▇▁▁
longitude	1	-89.36	14.92	-120.10	-90.36	-86.12	-75.42	-73.20	▅▁▁▇▇
stars	1	3.60	0.97	1.00	3.00	3.50	4.50	5.00	▁▃▂▇▆
review_count	1	44.87	121.12	5.00	8.00	15.00	37.00	7568.00	▇▁▁▁▁
is_open	1	0.80	0.40	0.00	1.00	1.00	1.00	1.00	▂▁▁▁▇

Data 3

Introduction and data

Identify the source of the data.

The dataset was downloaded from Kaggle (https://www.kaggle.com/code/ahmetburabua/drive-to-survive/input?select=final.csv).
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

The dataset was created by Ahmet Buğra Buğa, a data analyst at Mathrics (based on the Kaggle account). The account did not say much about the data curation process other than the fact that “the dataset was collected from public places and combined.” We presume that the data was scraped from sources like Wikipedia and the Ergast Developer API (http://ergast.com/mrd/). In the Kaggle page, we can also access yearly race data from 1983-2021. The Kaggler was able to compress the data into one “final.csv” file.
Write a brief description of the observations.

The observations included in the “final.csv” file contain all the pieces of information relevant to a particular race. The dataset tells you which circuit the race is on, whether the weather during the race is warm, cold, dry, wet, and/or cloudy, and what grid position a driver started the race in among other things.

Research question

What factors are most strongly associated with drivers’ success in Formula 1 racing, and how have these factors changed over time?

In this research question, we aim to understand the factors that contribute to success in Formula 1 racing, including the race circuit, weather, starting grid position, and drivers’ ages. Here, we equate drivers’ success with placing a podium in the race (i.e., first, second, and third place). There are five variables: four independent variables (i.e., the race circuit, weather, starting grid position, and drivers’ ages) and one dependent variable (i.e. finishing grid position). Starting and finishing grid positions and drivers’ ages are quantitative variables, while race circuit and weather are categorical variables. Other quantitative variables we could include in our analysis when it comes to success include driver points and qualifying times.

Formula 1 racing is one of the most popular and competitive sports in the world, and understanding the factors that contribute to success in this field is crucial for teams, drivers, and fans. By identifying the most important factors associated with success in Formula 1 racing, teams can optimize their strategies and improve their chances of winning, while fans can gain a deeper appreciation for the skills and abilities required to excel in this sport. Moreover, studying how these factors have changed over time can provide insights into the evolution of Formula 1 racing and shed light on the impact of technological advancements, changes in regulations, and other factors on the sport.

Our hypothesis is that warm and dry weather conditions are more conducive to better driver performance compared to cold, cloudy, and/or wet weather. Dry roads provide better traction for tires, enabling drivers to control the car more effectively. On the other hand, wet weather can cause the tires to slip, potentially resulting in loss of control. In cold weather, the tires and mechanical components may not reach optimal operating temperatures, leading to reduced grip and responsiveness of the car.

Furthermore, we hypothesize that there is an optimal age for drivers in Formula 1, as being too young may lead to lack of experience and being too old may result in slower reaction time.

Glimpse of data

f1_data <- read_csv("data/f1_data/f1_data.csv")

# Preview some rows
head(f1_data)

# A tibble: 6 × 22
   ...1 season round circuit_id  weather_warm weather_cold weather_dry
  <dbl>  <dbl> <dbl> <chr>       <lgl>        <lgl>        <lgl>      
1    14   1983     1 jacarepagua FALSE        FALSE        TRUE       
2     5   1983     1 jacarepagua FALSE        FALSE        TRUE       
3     3   1983     1 jacarepagua FALSE        FALSE        TRUE       
4     0   1983     1 jacarepagua FALSE        FALSE        TRUE       
5     6   1983     1 jacarepagua FALSE        FALSE        TRUE       
6     8   1983     1 jacarepagua FALSE        FALSE        TRUE       
# ℹ 15 more variables: weather_wet <lgl>, weather_cloudy <lgl>, driver <chr>,
#   nationality <chr>, constructor <chr>, grid <dbl>, podium <dbl>,
#   driver_points <dbl>, driver_wins <dbl>, driver_standings_pos <dbl>,
#   constructor_points <dbl>, constructor_wins <dbl>,
#   constructor_standings_pos <dbl>, qualifying_time <dbl>, driver_age <dbl>

# Skim through data
skim(f1_data)

Data summary
Name	f1_data
Number of rows	14794
Number of columns	22
_______________________
Column type frequency:
character	4
logical	5
numeric	13
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
circuit_id	1	3	14	50
driver	1	3	18	232
nationality	1	4	13	34
constructor	1	3	12	66

Variable type: logical

skim_variable	complete_rate	mean	count
weather_warm	1	0.39	FAL: 9063, TRU: 5731
weather_cold	1	0.02	FAL: 14473, TRU: 321
weather_dry	1	0.22	FAL: 11525, TRU: 3269
weather_wet	1	0.10	FAL: 13306, TRU: 1488
weather_cloudy	1	0.12	FAL: 12999, TRU: 1795

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
…1	1	7465.31	4350.15	0	3698.25	7403.5	11232.75	15085.0	▇▇▇▇▇
season	1	2001.59	11.24	1983	1992.00	2001.0	2012.00	2021.0	▇▇▆▇▇
round	1	9.19	5.12	1	5.00	9.0	13.00	21.0	▇▆▆▆▁
grid	1	11.76	6.70	1	6.00	12.0	17.00	27.0	▇▆▆▆▂
podium	1	11.90	6.77	1	6.00	12.0	17.00	27.0	▇▆▆▆▂
driver_points	1	19.94	42.08	0	0.00	3.0	19.00	387.0	▇▁▁▁▁
driver_wins	1	0.36	1.18	0	0.00	0.0	0.00	13.0	▇▁▁▁▁
driver_standings_pos	1	10.66	7.67	0	4.00	10.0	17.00	30.0	▇▅▅▃▁
constructor_points	1	40.06	81.62	0	0.00	8.0	41.00	722.0	▇▁▁▁▁
constructor_wins	1	0.74	1.95	0	0.00	0.0	0.00	18.0	▇▁▁▁▁
constructor_standings_pos	1	5.86	3.83	0	3.00	6.0	9.00	20.0	▇▆▅▁▁
qualifying_time	1	2.55	8.00	-77	1.00	2.1	3.50	904.6	▇▁▁▁▁
driver_age	1	28.59	4.73	17	25.00	28.0	32.00	43.0	▂▇▇▅▁