An Analysis of NYPD Arrest Data

Appendix to report

Data cleaning

Data Collection

Download data
Place data in data folder
Read data using read_csv function and place in dataframe

nypd_arrest_data_raw <- 
  read_csv("data/NYPD_Arrest_Data__Year_to_Date_.csv")

Rows: 189774 Columns: 19
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): ARREST_DATE, PD_DESC, OFNS_DESC, LAW_CODE, LAW_CAT_CD, ARREST_BORO...
dbl  (9): ARREST_KEY, PD_CD, KY_CD, ARREST_PRECINCT, JURISDICTION_CODE, X_CO...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Cleaning

Drop NA Values
Keep only relevant columns through select function. These columns were: ARREST_KEY, ARREST_DATE, PD_CD, PD_DESC, KY_CD, OFNS_DESC, LAW_CODE, LAW_CAT_CD, ARREST_BORO, AGE_GROUP, PERP_SEX, and PERP_RACE. We kept these columns as they were relevent to performing analysis of our research questions.
For some of the analysis we collapsed the LAW_CAT_CD column in to 3 factors instead of 4 as infractions and violations were essentially the same level and appeared very infrequently in the dataset. This was not a universal change however and thus only happened in the actual analysis and not data cleaning process.
For analysis regarding date of the arrest we separated the date column from one date in to month, day, and year columns for each date to make analysis of the time series data easier.
For many of the visualizations we expanded existing values in to their full names. This looked like the PERP_SEX variable having M be converted to Male and F be converted to Female. For the LAW_CAT_CD variable this looked like F becoming Felony, M becoming Misdemeanor, and I + V becoming Violation.

nypd_arrest_data <- nypd_arrest_data_raw |> 
  drop_na() |>
  select(1:14) |>
  select(!JURISDICTION_CODE) |>
  select(!ARREST_PRECINCT)

nypd_arrest_data

# A tibble: 187,457 × 12
   ARREST_KEY ARREST_DATE PD_CD PD_DESC      KY_CD OFNS_DESC LAW_CODE LAW_CAT_CD
        <dbl> <chr>       <dbl> <chr>        <dbl> <chr>     <chr>    <chr>     
 1  239553009 01/23/2022    464 JOSTLING       230 JOSTLING  PL 1652… M         
 2  239922214 01/31/2022    397 ROBBERY,OPE…   105 ROBBERY   PL 1601… F         
 3  239939130 02/01/2022    105 STRANGULATI…   106 FELONY A… PL 1211… F         
 4  240521791 02/13/2022    101 ASSAULT 3      344 ASSAULT … PL 1200… M         
 5  241022365 02/21/2022    397 ROBBERY,OPE…   105 ROBBERY   PL 1600… F         
 6  242064428 03/14/2022    105 STRANGULATI…   106 FELONY A… PL 1211… F         
 7  242456937 03/22/2022    105 STRANGULATI…   106 FELONY A… PL 1211… F         
 8  242818613 03/29/2022    705 FORGERY,ETC…   358 OFFENSES… PL 1702… M         
 9  243132247 04/05/2022    157 RAPE 1         104 RAPE      PL 1303… F         
10  244567670 05/04/2022    109 ASSAULT 2,1…   106 FELONY A… PL 1200… F         
# ℹ 187,447 more rows
# ℹ 4 more variables: ARREST_BORO <chr>, AGE_GROUP <chr>, PERP_SEX <chr>,
#   PERP_RACE <chr>

Other Appendices

An analysis we did not include in the report but thought was of interest and could be included in the appendices would be comparing the proportion of misdemeanor vs felony arrests for black men arrested for dangerous drugs. We found the difference to be statistically significant which tells an interesting story to how there are more felony convictions for black men arrested for dangerous drugs than misdemeanor convictions. There is likely some racial biases at hand here were black people are biased against and given harsher sentences due to skin color.

Null hypothesis offense: The true proportion of black perpetrators arrested for dangerous drugs who were convicted of felonies is the same as black perpetrators arrested for dangerous drugs who were convicted of misdemeanors.

\[H_0: p_f = p_m\]

Alternative hypothesis offense: The true proportion of black perpetrators arrested for dangerous drugs who were convicted of felonies is not the same as black perpetrators arrested for dangerous drugs who were convicted of misdemeanors.

\[H_A: p_f \neq p_m\]

For this analysis we had to filter for only black perpetrators and not black perpetrators who’s offense description was ‘dangerous drugs’.

The point estimate, 0.0674148 represents the observed difference in proportion of arrests of black people arrest for felonies vs misdemeanors for dangerous drugs.

set.seed(123)

null_dist_ot <- offense_type_data |>
  mutate(LAW_CAT_CD = fct_collapse(LAW_CAT_CD,
         Felony = "F",
         Misdemeanor = "M",
         Violation = c("V", "I")),
         LAW_CAT_CD = fct_relevel(LAW_CAT_CD, 
         c("Felony", "Misdemeanor", "Violation"))) |>
  filter(LAW_CAT_CD != "Violation") |>
  filter(OFNS_DESC == "DANGEROUS DRUGS") |>
  droplevels() |>
  specify(PERP_RACE ~ LAW_CAT_CD, success = "BLACK") |>
  hypothesize(null = "independence") |>
  generate(1000, type = "permute") |>
  calculate(stat = "diff in props", order = c("Felony", "Misdemeanor"))

We generated the null distribution of difference in proportion of arrests of black people arrest for felonies vs misdemeanors for ‘dangerous drugs’ through permutation 1000 times.

visualize(null_dist_ot) +
 shade_p_value(obs_stat = point_estimate_ot, direction = "two-sided")

null_dist_ot |>
  get_p_value(obs_stat = point_estimate_ot, direction = "two-sided")

Warning: Please be cautious in reporting a p-value of 0. This result is an
approximation based on the number of `reps` chosen in the `generate()` step. See
`?get_p_value()` for more information.

# A tibble: 1 × 1
  p_value
    <dbl>
1       0

With a p-value less than 0.05, we reject the null hypothesis. There is significant evidence that the true proportion of black perpetrators arrested for dangerous drugs who were convicted of felonies is not the same as black perpetrators arrested for dangerous drugs who were convicted of misdemeanors.