Our Motivation
Due to its uniquely dense arrangement and construction layout, New York has seen an unparalleled level of traffic accidents, despite its effective urban planning. Our project focuses on understanding the factors contributing to this trend — we picked New York Traffic Accidents as the topic given that we believe that analyzing which factors are associated with higher severity of accidents and that cause significant travel delays could provide critical insights to make our streets safer. Moreover, transportation systems and navigation applications would also effectively analyze and map-out the most efficient and reliable routes for their customers, transforming the way we travel through one of the most bustling cities in the world.
Our Research Question:
For accidents specifically in New York State, how does the weather condition and time of day affect the severity of an accident?
Introducing the data
- This dataset was created to act as an up-to-date large-scale dataset that provides wide coverage on traffic accidents in the US. With this, we can analyze this data to find what causes traffic accidents to make changes that will reduce traffic accidents.
- The observations (rows) are incidents of traffic accidents that have happened in the New York since February 2016
- The attributes (columns) are different details on when and where the accident occurred, and what details surrounded the accident (i.e. traffic signs, crossings, weather conditions, time of day, etc.)
Highlights from EDA
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.2 ✔ purrr 1.0.0
✔ tibble 3.2.1 ✔ dplyr 1.1.2
✔ tidyr 1.2.1 ✔ stringr 1.5.0
✔ readr 2.1.3 ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
── Attaching packages ────────────────────────────────────── tidymodels 1.0.0 ──
✔ broom 1.0.2 ✔ rsample 1.1.1
✔ dials 1.1.0 ✔ tune 1.1.1
✔ infer 1.0.4 ✔ workflows 1.1.2
✔ modeldata 1.0.1 ✔ workflowsets 1.0.0
✔ parsnip 1.0.3 ✔ yardstick 1.1.0
✔ recipes 1.0.6
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter() masks stats::filter()
✖ recipes::fixed() masks stringr::fixed()
✖ dplyr::lag() masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step() masks stats::step()
• Learn how to get started at https://www.tidymodels.org/start/
Loading required package: timechange
Attaching package: 'lubridate'
The following objects are masked from 'package:base':
date, intersect, setdiff, union
Warning: Removed 19761 rows containing missing values (`geom_point()`).
Warning: Removed 107948 rows containing missing values (`geom_point()`).
Does the weather condition of heavy snow affect the severity of an accident?
Null hypothesis: The proportion of severe car accidents is the same for days that have heavy snow and days that are clear.
\[
H_0 : p_{heavysnow} - p_{clear} = 0
\]
Alternative hypothesis: The proportion of severe car accidents is not the same for days that have heavy snow and days that are clear.
\[
H_A : p_{heavysnow} - p_{clear} \neq 0
\]
Conclusion: Since our p-value is 0, which is less than 0.05, we reject the null hypothesis. We conclude that the weather condition of heavy snow does have an effect on the severity of an accident.
Does the time of day affect the severity of an accident?
Null hypothesis: The proportion of severe car accidents is the same for accidents that have occurred during the day vs. the night.
\[
H_0 : p_{day} - p_{night} = 0
\]
Alt. hypothesis: The proportion of severe car accidents is not the same for accidents that have occurred during the day vs. the night.
\[
H_A : p_{day} - p_{night} \neq 0
\]
Conclusion: Since our p-value is 0, which is less than 0.05, we reject the null hypothesis. We conclude that the time of day does have an effect on the severity of an accident.
Conclusions + future work
While analyzing our data, here are the patterns we found in the severity of car crashes.
Does the time of day affect the severity of an accident? — Yes, given that we are rejecting our null hypothesis as we found a p-value of 0.
Does the weather condition affect the severity of an accident? — Yes!
Moreover, the vast majority of accidents typically cause a backup of around less than 1/2 a mile.
In the future, we would love to explore the relationship between such variables in different parts of the world where their infrastructure and layout is quite different. We also did not factor in specific locations’ impact in New York, which would be fascinating to further look into and analyze correlations within.