Wondrous Raichu

Project Proposal

library(tidyverse)
library(skimr)
library(jsonlite)
library(dplyr)
library(tibble)

Data 1

Introduction and data

  • Identify the source of the data.

    The dataset comes from Inside Airbnb (http://insideairbnb.com/get-the-data/), an open platform that provides data on Airbnb listings in different locations around the world.

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

    The data gives NYC listings from the last quarter in 2022. It is aggregated through Airbnb’s public information on their website and shows all listings from that respective quarter at the particular time of publication (i.e., for this particular dataset, December 4th and 5th, 2022). 

  • Write a brief description of the observations.

    The observations published in this dataset contain all the pieces of information posted on a typical Airbnb listing. It describes the host and who they are, ratings on the host and place of stay, etc.

Research question(s)

How does Airbnb listing prices depend on rating of listing, location of listing, amenities included in the listing, and host response time?

In this research question, we aim to understand what factors affect Airbnb listing prices. There are five variables: four independent variables (i.e., rating of listing, location of listing, amenities included in the listing, and host response time) and one dependent variable (i.e. listing price). Listing price, rating of listing, and host response time are quantitative variables, while location of listing and amenities included in the listing are categorical variables.

This research question is important because it provides us with a better understanding of the Airbnb price mechanism, which can shed light on the broader dynamics of the sharing economy and the way in which online platforms like Airbnb are changing the traditional hospitality industry.

Our hypothesis is that a high listing rating, a more comprehensive list of included amenities, a quick host response time, and a listing located in a safe and convenient location will all lead to higher listing prices.

How do characteristics (i.e., ratings and descriptions) of listings and of hosts affect the prices of listings?

In this research question, we aim to understand how the characteristics of listings and hots affect the prices of listings on Airbnb. There are five variables: four independent variables (i.e., rating of host, description of host, rating of listing, description of listing) and one dependent variable (i.e. listing price). Ratings and listing prices are quantitative variables, while descriptions are categorical variables.

This research question is important because it highlights the strategies hosts take to make a good impression on potential guests and remain competitive in the crowded Airbnb marketplace. Moreover, for potential guests, understanding the strategies hosts use can help them make more informed decisions when selecting a listing that meets their needs and budget.

Our hypothesis is that higher listing ratings and higher host ratings will lead to higher listing prices. Furthermore, listings and hosts with more detailed and attractive descriptions will lead to higher listing prices.

Glimpse of data

airbnb_data <- read_csv("data/airbnb_data/airbnb_data.csv")

# Preview some rows
head(airbnb_data)
# A tibble: 6 × 75
     id listing_url              scrape_id last_scraped source name  description
  <dbl> <chr>                        <dbl> <date>       <chr>  <chr> <chr>      
1  2595 https://www.airbnb.com/…   2.02e13 2022-12-05   city … Skyl… "Beautiful…
2  5203 https://www.airbnb.com/…   2.02e13 2022-12-05   previ… Cozy… "Our best …
3  5136 https://www.airbnb.com/…   2.02e13 2022-12-04   city … Spac… "We welcom…
4  5121 https://www.airbnb.com/…   2.02e13 2022-12-05   city … Blis… "One room …
5  6848 https://www.airbnb.com/…   2.02e13 2022-12-05   city … Only… "Comfortab…
6  5178 https://www.airbnb.com/…   2.02e13 2022-12-05   city … Larg… "Please do…
# ℹ 68 more variables: neighborhood_overview <chr>, picture_url <chr>,
#   host_id <dbl>, host_url <chr>, host_name <chr>, host_since <date>,
#   host_location <chr>, host_about <chr>, host_response_time <chr>,
#   host_response_rate <chr>, host_acceptance_rate <chr>,
#   host_is_superhost <lgl>, host_thumbnail_url <chr>, host_picture_url <chr>,
#   host_neighbourhood <chr>, host_listings_count <dbl>,
#   host_total_listings_count <dbl>, host_verifications <chr>, …
# Skim through data
skim(airbnb_data)
Data summary
Name airbnb_data
Number of rows 41533
Number of columns 75
_______________________
Column type frequency:
character 25
Date 5
logical 8
numeric 37
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
listing_url 0 1.00 33 47 0 41533 0
source 0 1.00 11 15 0 2 0
name 11 1.00 1 248 0 40242 0
description 785 0.98 1 1000 0 37137 0
neighborhood_overview 17443 0.58 1 1000 0 19392 0
picture_url 0 1.00 60 126 0 40390 0
host_url 0 1.00 38 43 0 26832 0
host_name 5 1.00 1 35 0 9628 0
host_location 7745 0.81 5 40 0 1104 0
host_about 18316 0.56 1 7309 0 14151 26
host_response_time 5 1.00 3 18 0 5 0
host_response_rate 5 1.00 2 4 0 68 0
host_acceptance_rate 5 1.00 2 4 0 99 0
host_thumbnail_url 5 1.00 55 106 0 26362 0
host_picture_url 5 1.00 57 109 0 26362 0
host_neighbourhood 8189 0.80 3 50 0 539 0
host_verifications 0 1.00 2 32 0 8 0
neighbourhood 17443 0.58 13 55 0 191 0
neighbourhood_cleansed 0 1.00 4 25 0 223 0
neighbourhood_group_cleansed 0 1.00 5 13 0 5 0
property_type 0 1.00 4 34 0 80 0
room_type 0 1.00 10 15 0 4 0
bathrooms_text 77 1.00 6 17 0 30 0
amenities 0 1.00 2 2028 0 35522 0
price 0 1.00 5 10 0 1287 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
last_scraped 0 1.00 2022-12-04 2022-12-05 2022-12-05 2
host_since 5 1.00 2008-08-22 2022-12-02 2016-04-04 4649
calendar_last_scraped 0 1.00 2022-12-04 2022-12-05 2022-12-05 2
first_review 9393 0.77 2009-04-23 2022-12-04 2019-12-15 3772
last_review 9393 0.77 2011-05-12 2022-12-04 2022-09-15 2715

Variable type: logical

skim_variable n_missing complete_rate mean count
host_is_superhost 29 1 0.21 FAL: 32635, TRU: 8869
host_has_profile_pic 5 1 0.98 TRU: 40904, FAL: 624
host_identity_verified 5 1 0.85 TRU: 35262, FAL: 6266
bathrooms 41533 0 NaN :
calendar_updated 41533 0 NaN :
has_availability 0 1 0.85 TRU: 35281, FAL: 6252
license 41533 0 NaN :
instant_bookable 0 1 0.20 FAL: 33123, TRU: 8410

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
id 0 1.00 1.728318e+17 2.974371e+17 2.59500e+03 1.835861e+07 4.117861e+07 5.477978e+17 7.741268e+17 ▇▁▁▁▂
scrape_id 0 1.00 2.022120e+13 0.000000e+00 2.02212e+13 2.022120e+13 2.022120e+13 2.022120e+13 2.022120e+13 ▁▁▇▁▁
host_id 0 1.00 1.400636e+08 1.526932e+08 2.43800e+03 1.491162e+07 6.561181e+07 2.418897e+08 4.899967e+08 ▇▂▂▁▂
host_listings_count 5 1.00 8.662000e+01 5.183500e+02 1.00000e+00 1.000000e+00 2.000000e+00 5.000000e+00 4.559000e+03 ▇▁▁▁▁
host_total_listings_count 5 1.00 1.362600e+02 7.735100e+02 1.00000e+00 1.000000e+00 3.000000e+00 7.000000e+00 1.201700e+04 ▇▁▁▁▁
latitude 0 1.00 4.073000e+01 6.000000e-02 4.05000e+01 4.069000e+01 4.072000e+01 4.076000e+01 4.091000e+01 ▁▂▇▅▁
longitude 0 1.00 -7.394000e+01 6.000000e-02 -7.42500e+01 -7.398000e+01 -7.395000e+01 -7.392000e+01 -7.371000e+01 ▁▁▇▂▁
accommodates 0 1.00 2.960000e+00 2.080000e+00 0.00000e+00 2.000000e+00 2.000000e+00 4.000000e+00 1.600000e+01 ▇▃▁▁▁
bedrooms 3822 0.91 1.380000e+00 7.600000e-01 1.00000e+00 1.000000e+00 1.000000e+00 2.000000e+00 1.400000e+01 ▇▁▁▁▁
beds 941 0.98 1.650000e+00 1.160000e+00 1.00000e+00 1.000000e+00 1.000000e+00 2.000000e+00 4.200000e+01 ▇▁▁▁▁
minimum_nights 0 1.00 1.859000e+01 3.070000e+01 1.00000e+00 2.000000e+00 1.000000e+01 3.000000e+01 1.250000e+03 ▇▁▁▁▁
maximum_nights 0 1.00 5.324173e+04 1.053830e+07 1.00000e+00 6.000000e+01 3.650000e+02 1.125000e+03 2.147484e+09 ▇▁▁▁▁
minimum_minimum_nights 14 1.00 1.864000e+01 3.239000e+01 1.00000e+00 2.000000e+00 7.000000e+00 3.000000e+01 1.250000e+03 ▇▁▁▁▁
maximum_minimum_nights 14 1.00 2.297000e+01 4.853000e+01 1.00000e+00 2.000000e+00 1.400000e+01 3.000000e+01 1.250000e+03 ▇▁▁▁▁
minimum_maximum_nights 14 1.00 1.243053e+06 5.161702e+07 1.00000e+00 2.700000e+02 1.125000e+03 1.125000e+03 2.147484e+09 ▇▁▁▁▁
maximum_maximum_nights 14 1.00 2.122356e+06 6.745119e+07 1.00000e+00 3.650000e+02 1.125000e+03 1.125000e+03 2.147484e+09 ▇▁▁▁▁
minimum_nights_avg_ntm 14 1.00 2.251000e+01 4.738000e+01 1.00000e+00 2.000000e+00 1.000000e+01 3.000000e+01 1.250000e+03 ▇▁▁▁▁
maximum_nights_avg_ntm 14 1.00 1.398113e+06 5.294218e+07 1.00000e+00 3.650000e+02 1.125000e+03 1.125000e+03 2.147484e+09 ▇▁▁▁▁
availability_30 0 1.00 7.900000e+00 1.014000e+01 0.00000e+00 0.000000e+00 2.000000e+00 1.500000e+01 3.000000e+01 ▇▂▁▁▂
availability_60 0 1.00 2.192000e+01 2.210000e+01 0.00000e+00 0.000000e+00 1.800000e+01 4.200000e+01 6.000000e+01 ▇▁▂▂▃
availability_90 0 1.00 3.681000e+01 3.487000e+01 0.00000e+00 0.000000e+00 3.300000e+01 7.000000e+01 9.000000e+01 ▇▁▁▃▅
availability_365 0 1.00 1.432900e+02 1.442800e+02 0.00000e+00 0.000000e+00 8.700000e+01 3.120000e+02 3.650000e+02 ▇▂▂▁▅
number_of_reviews 0 1.00 2.620000e+01 5.618000e+01 0.00000e+00 1.000000e+00 5.000000e+00 2.500000e+01 1.666000e+03 ▇▁▁▁▁
number_of_reviews_ltm 0 1.00 7.980000e+00 1.856000e+01 0.00000e+00 0.000000e+00 1.000000e+00 8.000000e+00 9.920000e+02 ▇▁▁▁▁
number_of_reviews_l30d 0 1.00 6.700000e-01 1.550000e+00 0.00000e+00 0.000000e+00 0.000000e+00 1.000000e+00 7.400000e+01 ▇▁▁▁▁
review_scores_rating 9393 0.77 4.630000e+00 7.200000e-01 0.00000e+00 4.600000e+00 4.830000e+00 5.000000e+00 5.000000e+00 ▁▁▁▁▇
review_scores_accuracy 9841 0.76 4.750000e+00 4.600000e-01 0.00000e+00 4.710000e+00 4.890000e+00 5.000000e+00 5.000000e+00 ▁▁▁▁▇
review_scores_cleanliness 9831 0.76 4.630000e+00 5.400000e-01 0.00000e+00 4.500000e+00 4.800000e+00 5.000000e+00 5.000000e+00 ▁▁▁▁▇
review_scores_checkin 9845 0.76 4.820000e+00 4.100000e-01 0.00000e+00 4.800000e+00 4.950000e+00 5.000000e+00 5.000000e+00 ▁▁▁▁▇
review_scores_communication 9836 0.76 4.810000e+00 4.300000e-01 0.00000e+00 4.800000e+00 4.960000e+00 5.000000e+00 5.000000e+00 ▁▁▁▁▇
review_scores_location 9848 0.76 4.740000e+00 4.100000e-01 0.00000e+00 4.640000e+00 4.860000e+00 5.000000e+00 5.000000e+00 ▁▁▁▁▇
review_scores_value 9848 0.76 4.650000e+00 4.900000e-01 0.00000e+00 4.550000e+00 4.770000e+00 4.950000e+00 5.000000e+00 ▁▁▁▁▇
calculated_host_listings_count 0 1.00 2.063000e+01 6.887000e+01 1.00000e+00 1.000000e+00 1.000000e+00 4.000000e+00 4.870000e+02 ▇▁▁▁▁
calculated_host_listings_count_entire_homes 0 1.00 1.131000e+01 5.645000e+01 0.00000e+00 0.000000e+00 1.000000e+00 1.000000e+00 4.870000e+02 ▇▁▁▁▁
calculated_host_listings_count_private_rooms 0 1.00 9.200000e+00 4.009000e+01 0.00000e+00 0.000000e+00 0.000000e+00 2.000000e+00 3.450000e+02 ▇▁▁▁▁
calculated_host_listings_count_shared_rooms 0 1.00 5.000000e-02 5.900000e-01 0.00000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.500000e+01 ▇▁▁▁▁
reviews_per_month 9393 0.77 1.280000e+00 1.940000e+00 1.00000e-02 1.400000e-01 5.800000e-01 1.880000e+00 1.029800e+02 ▇▁▁▁▁

Data 2

Introduction and data

  • Identify the source of the data.

    The dataset comes from Yelp Dataset (https://www.yelp.com/dataset), which provides a subset of the businesses, reviews, and user data on Yelp.

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

    The Yelp dataset was originally collected by Yelp’s data team and is publicly available for research and educational purposes. The dataset includes information on businesses located in 10 metropolitan areas in four countries: the United States, Canada, the United Kingdom, and Germany.

  • Write a brief description of the observations.

    The dataset is divided into several JSON files:

    • business.json contains business data including location data, attributes, and categories

    • review.json contains fill review text data 

    • user.json contains user data and all the metadata associated with users

    • checkin.json contains data on checkins on a business

    • tip.json contains data on tips written by a user on a business

    In this proposal, we will focus on the data outlined in business.json. The business.json file contains location details (address, city, state, etc.), number of stars, as well as number of reviews among other things.

Research question

Are there differences in the way that customers review chain vs. independent restaurants, and do these differences vary depending on the type of cuisine or location?

In this research question, we aim to understand how customers view chain and independent restaurants. Customer perceptions can be analyzed through ratings and reviews. There are five variables: three independent variables (i.e., type of restaurant, type of cuisine, and location) and two dependent variables (i.e., ratings and reviews). Ratings are quantitative variables, while reviews, type of restaurant, type of cuisine, and location are categorical variables.

This research question is important because it can help restaurant owners and managers to better understand customer perceptions and preferences towards chain and independent restaurants. By examining whether differences in customer reviews vary based on cuisine type or location, insights can be gained into the factors that influence customer satisfaction and provide guidance on how to improve customer experiences.

Our hypothesis is that customers may perceive chain and independent restaurants differently, with chain restaurants being perceived as more consistent and reliable in terms of quality and service, while independent restaurants may be seen as more unique and offering more personalized experiences. The differences in customer perceptions may also vary depending on the type of cuisine or location, with certain cuisines or cities having a stronger preference for chain or independent restaurants.

Glimpse of data

# Consulted Professor Soltoff on how to rectangle dataset
yelp_raw <- read_lines(file = "data/yelp_data/yelp_academic_dataset_business.json")

yelp_list <- map(.x = yelp_raw, .f = fromJSON)

yelp_data <- tibble(yelp = yelp_list) |>
  unnest_wider(col = yelp)

# Preview some rows
head(yelp_data)
# A tibble: 6 × 14
  business_id     name  address city  state postal_code latitude longitude stars
  <chr>           <chr> <chr>   <chr> <chr> <chr>          <dbl>     <dbl> <dbl>
1 Pns2l4eNsfO8kk… Abby… 1616 C… Sant… CA    93101           34.4    -120.    5  
2 mpf3x-BjTdTEA3… The … 87 Gra… Afft… MO    63123           38.6     -90.3   3  
3 tUFrWirKiKi_TA… Targ… 5255 E… Tucs… AZ    85711           32.2    -111.    3.5
4 MTSW4McQd7CbVt… St H… 935 Ra… Phil… PA    19107           40.0     -75.2   4  
5 mWMc6_wTdE0EUB… Perk… 101 Wa… Gree… PA    18054           40.3     -75.5   4.5
6 CF33F8-E6oudUQ… Soni… 615 S … Ashl… TN    37015           36.3     -87.1   2  
# ℹ 5 more variables: review_count <int>, is_open <int>, attributes <list>,
#   categories <chr>, hours <list>
# Skim through data
skim(yelp_data)
Data summary
Name yelp_data
Number of rows 150346
Number of columns 14
_______________________
Column type frequency:
character 7
list 2
numeric 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
business_id 0 1 22 22 0 150346 0
name 0 1 2 64 0 114117 0
address 0 1 0 110 5127 122844 0
city 0 1 3 52 0 1416 0
state 0 1 2 3 0 27 0
postal_code 0 1 0 7 73 3362 0
categories 103 1 4 503 0 83160 0

Variable type: list

skim_variable n_missing complete_rate n_unique min_length max_length
attributes 0 1 87662 0 33
hours 0 1 49823 0 7

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
latitude 0 1 36.67 5.87 27.56 32.19 38.78 39.95 53.68 ▅▂▇▁▁
longitude 0 1 -89.36 14.92 -120.10 -90.36 -86.12 -75.42 -73.20 ▅▁▁▇▇
stars 0 1 3.60 0.97 1.00 3.00 3.50 4.50 5.00 ▁▃▂▇▆
review_count 0 1 44.87 121.12 5.00 8.00 15.00 37.00 7568.00 ▇▁▁▁▁
is_open 0 1 0.80 0.40 0.00 1.00 1.00 1.00 1.00 ▂▁▁▁▇

Data 3

Introduction and data

  • Identify the source of the data.

    The dataset was downloaded from Kaggle (https://www.kaggle.com/code/ahmetburabua/drive-to-survive/input?select=final.csv).

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

    The dataset was created by Ahmet Buğra Buğa, a data analyst at Mathrics (based on the Kaggle account). The account did not say much about the data curation process other than the fact that “the dataset was collected from public places and combined.” We presume that the data was scraped from sources like Wikipedia and the Ergast Developer API (http://ergast.com/mrd/). In the Kaggle page, we can also access yearly race data from 1983-2021. The Kaggler was able to compress the data into one “final.csv” file.

  • Write a brief description of the observations.

    The observations included in the “final.csv” file contain all the pieces of information relevant to a particular race. The dataset tells you which circuit the race is on, whether the weather during the race is warm, cold, dry, wet, and/or cloudy, and what grid position a driver started the race in among other things.

Research question

What factors are most strongly associated with drivers’ success in Formula 1 racing, and how have these factors changed over time?

In this research question, we aim to understand the factors that contribute to success in Formula 1 racing, including the race circuit, weather, starting grid position, and drivers’ ages. Here, we equate drivers’ success with placing a podium in the race (i.e., first, second, and third place). There are five variables: four independent variables (i.e., the race circuit, weather, starting grid position, and drivers’ ages) and one dependent variable (i.e. finishing grid position). Starting and finishing grid positions and drivers’ ages are quantitative variables, while race circuit and weather are categorical variables. Other quantitative variables we could include in our analysis when it comes to success include driver points and qualifying times.

Formula 1 racing is one of the most popular and competitive sports in the world, and understanding the factors that contribute to success in this field is crucial for teams, drivers, and fans. By identifying the most important factors associated with success in Formula 1 racing, teams can optimize their strategies and improve their chances of winning, while fans can gain a deeper appreciation for the skills and abilities required to excel in this sport. Moreover, studying how these factors have changed over time can provide insights into the evolution of Formula 1 racing and shed light on the impact of technological advancements, changes in regulations, and other factors on the sport.

Our hypothesis is that warm and dry weather conditions are more conducive to better driver performance compared to cold, cloudy, and/or wet weather. Dry roads provide better traction for tires, enabling drivers to control the car more effectively. On the other hand, wet weather can cause the tires to slip, potentially resulting in loss of control. In cold weather, the tires and mechanical components may not reach optimal operating temperatures, leading to reduced grip and responsiveness of the car.

Furthermore, we hypothesize that there is an optimal age for drivers in Formula 1, as being too young may lead to lack of experience and being too old may result in slower reaction time.

Glimpse of data

f1_data <- read_csv("data/f1_data/f1_data.csv")

# Preview some rows
head(f1_data)
# A tibble: 6 × 22
   ...1 season round circuit_id  weather_warm weather_cold weather_dry
  <dbl>  <dbl> <dbl> <chr>       <lgl>        <lgl>        <lgl>      
1    14   1983     1 jacarepagua FALSE        FALSE        TRUE       
2     5   1983     1 jacarepagua FALSE        FALSE        TRUE       
3     3   1983     1 jacarepagua FALSE        FALSE        TRUE       
4     0   1983     1 jacarepagua FALSE        FALSE        TRUE       
5     6   1983     1 jacarepagua FALSE        FALSE        TRUE       
6     8   1983     1 jacarepagua FALSE        FALSE        TRUE       
# ℹ 15 more variables: weather_wet <lgl>, weather_cloudy <lgl>, driver <chr>,
#   nationality <chr>, constructor <chr>, grid <dbl>, podium <dbl>,
#   driver_points <dbl>, driver_wins <dbl>, driver_standings_pos <dbl>,
#   constructor_points <dbl>, constructor_wins <dbl>,
#   constructor_standings_pos <dbl>, qualifying_time <dbl>, driver_age <dbl>
# Skim through data
skim(f1_data)
Data summary
Name f1_data
Number of rows 14794
Number of columns 22
_______________________
Column type frequency:
character 4
logical 5
numeric 13
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
circuit_id 0 1 3 14 0 50 0
driver 0 1 3 18 0 232 0
nationality 0 1 4 13 0 34 0
constructor 0 1 3 12 0 66 0

Variable type: logical

skim_variable n_missing complete_rate mean count
weather_warm 0 1 0.39 FAL: 9063, TRU: 5731
weather_cold 0 1 0.02 FAL: 14473, TRU: 321
weather_dry 0 1 0.22 FAL: 11525, TRU: 3269
weather_wet 0 1 0.10 FAL: 13306, TRU: 1488
weather_cloudy 0 1 0.12 FAL: 12999, TRU: 1795

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
…1 0 1 7465.31 4350.15 0 3698.25 7403.5 11232.75 15085.0 ▇▇▇▇▇
season 0 1 2001.59 11.24 1983 1992.00 2001.0 2012.00 2021.0 ▇▇▆▇▇
round 0 1 9.19 5.12 1 5.00 9.0 13.00 21.0 ▇▆▆▆▁
grid 0 1 11.76 6.70 1 6.00 12.0 17.00 27.0 ▇▆▆▆▂
podium 0 1 11.90 6.77 1 6.00 12.0 17.00 27.0 ▇▆▆▆▂
driver_points 0 1 19.94 42.08 0 0.00 3.0 19.00 387.0 ▇▁▁▁▁
driver_wins 0 1 0.36 1.18 0 0.00 0.0 0.00 13.0 ▇▁▁▁▁
driver_standings_pos 0 1 10.66 7.67 0 4.00 10.0 17.00 30.0 ▇▅▅▃▁
constructor_points 0 1 40.06 81.62 0 0.00 8.0 41.00 722.0 ▇▁▁▁▁
constructor_wins 0 1 0.74 1.95 0 0.00 0.0 0.00 18.0 ▇▁▁▁▁
constructor_standings_pos 0 1 5.86 3.83 0 3.00 6.0 9.00 20.0 ▇▆▅▁▁
qualifying_time 0 1 2.55 8.00 -77 1.00 2.1 3.50 904.6 ▇▁▁▁▁
driver_age 0 1 28.59 4.73 17 25.00 28.0 32.00 43.0 ▂▇▇▅▁