Data Analysis of Severe Crimes Committed in New York City

Preregistration of analyses

Analysis #1

We will analyze the geographical correlation to the concentrated locations of crimes within the boroughs of New York City by performing a spatial analysis. By plotting the x and y coordinates of the crimes, using the longitude and latitude columns, and color coding it based on the specific types of crimes, we can analyze geographically if there are specific patterns to types of dangerous crimes for each borough. This can help answer the last part of our research question regarding the location of high crimes and the specific types of crimes among dangerous crimes for each borough.

Analysis #2

We can analyze the probability of a certain demographic to be identified as “most dangerous” by performing a logistic regression on the level of offense of crime by the demography of the criminal. If we let level_of_offense column as the dependent variable, or we create a new column called severe_crime where we label the different codes of level_of_offense column as either severe or not. Then we can add independent variables that describe the demographic of the criminal such as age group, race, and gender. Since the independent variables are all categorical, the results of the model can let us identify which combination of these independent variables will result in the highest probability of committing a severe crime, which will help answer our research question.