Accident severity in New York State

Preregistration of analyses

Analysis #1

In order to analyze what impacts accident severity in New York state, we will utilize a logistic regression model with severity as the outcome variable. Severity levels 1 and 2 will be 0, as these are less severe accidents, while severity levels 3 and 4 will be 1. Importantly, severity level 2 makes up a significant amount of the accidents in our data set, so, in order to counterbalance this, we will use a random sample of accidents from severity level 2, along with the few accidents from severity level 1, to make up the outcome that equals 0. This sample will be roughly 14,000 accidents; thus, our population sizes for outcome 0 and outcome 1 will then be about the same. This logit model will take the following numerical variables as predictors: precipitation, visibility, wind speed. With this model, we aim to model the probability of getting into a severe accident in New York state based off of how much precipitation there is, how far one can see, and how fast the wind speed outside is. See a mathematical representation of the model below:

\[ log\left(\frac{p}{1-p}\right)\:=\:\beta \:_0\:+\:\beta \:_1\cdot \:precipiation\:+\:\beta \:\:_2\:\cdot \:visibility\:+\:\:\beta \:\:\:_3\:\cdot \:\:wind_-speed \]

Hypothesis Testing:

\[ \begin{split} H_o: \beta_1 = \beta_2 = \beta_3 = 0 \\ H_a: \beta_1 \neq 0 \ or \beta_2 \neq 0 \ or \beta_3 \neq 0 \end{split} \]

Analysis #2

The second analysis we will conduct will also be a logistic regression model that involves the same outcome variable as above, but the predictors will be different. Instead of using solely numerical variables and only an additive model, this model will incorporate a categorical variable and interactions. Along with the numerical variables above (precipitation, wind speed, and visibility), we will also add a binary variable for whether the accident occurred at a junction. Furthermore, we will interact this dummy variable on the other predictors, as we believe the likelihood of getting into a severe accident at a junction may be impacted by variables such as amount of precipitation, wind speed, and visibility. With this model, we aim to see how being at a junction impacts the probability of getting into a severe accident in New York state based off of how much precipitation there is, how far one can see, and how fast the wind speed outside is. See a mathematical representation of the model below:

\[ \begin{split} log\left(\frac{p}{1-p}\right)\:=\:\beta \:_0\:+\:\beta \:_1\cdot \:precipiation\:+\:\beta \:\:_2\:\cdot \:visibility\:\\+\:\:\beta \:\:\:3\:\cdot \:\:wind_-speed\:+\:\beta _{\:4}\:\cdot \:\:junction\:+\:\:\beta \:\:_5\cdot \:\:precipiation\:\cdot \:\:\:junction\\+\:\beta \:\:\:_6\:\cdot \:\:visibility\:\cdot \:\:\:junction+\:\:\beta \:\:\:_7\:\cdot \:\:\:wind_-speed\:\cdot \:\:\:junction \end{split} \]

Hypothesis Testing:

\[ \begin{split} H_o: \beta_1 = \beta_2 = \beta_3 = \beta_4 = \beta_5 = \beta_6 = \beta_7 = 0 \\ H_a: \beta_1 \neq 0 \ or \beta_2 \neq 0 \ or \beta_3 \neq 0 \ or \beta_4 \neq0 \ or \beta_5 \neq 0 \ or \beta_6 \neq 0 \ or \beta_7 \neq 0 \end{split} \]