Skip to content

maxpiasevoli/StopAndFriskData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Pre-Processing Data for Analysis of the NYPD "Stop-and-Frisk" Policy

The code in this repository handles all pre-processing of data needed to reproduce the analysis conducted by Gelman, Fagan and Kiss in An Analysis of the New York City Police Department's "Stop-and-Frisk" Policy in the Context of Claims of Racial Bias. This data can additionally be used to produce new analyses as well. All data sources used here are public-domain.


Data Processing.ipynb

This Jupyter notebook contains all code related to processing the number of stops in a given 15 month period as well as the number of arrests in the previous year by precinct and race. We calculate the number of stops and arrests using records of stops from the NYPD's Stop, Question, and Frisk Database. In this case, we calculate the number of stops by precinct and race in the 15 month period from January 2015 through March 2016 as well as the number of arrests in 2014. Note that these .csv files can be replaced with .csv files from other years however it must be noted that with this code, the three selected years must have the same feature names. Note that the feature names of the Stop, Question, and Frisk database changed between 2016 and 2017. If .csv files from 2017 and on are used, change the corresponding feature variables in the second cell of the Jupyter notebook. As in the Gelman paper, we consider the three races of White, Hispanic, and Black in considering stops and arrests. The Stop, Question and Frisk Database classifies stopped individuals as either White-Hispanic or Black-Hispanic, so we consider White-Hispanic individuals as "Hispanic" and Black-Hispanic individuals as "Black" for this data. The paper separately considers stops and arrests for violent, property, drug, and weapon crimes so we separately process stops and arrests for each type of crime.

The analysis in the Gelman paper also requires the White, Hispanic, and Black populations in each precinct for some models. To produce this data, we first downloaded the NYPD police precinct boundaries from the NYC Open Data Portal. We then used the census block group boundaries from Simply Analytics and used the spatial join function to create relationship between NYC Police precincts boundaries and census block groups. Finally, the Hispanic 2017 block group level data of NYC from the American Community Survey 2013-2017 (5-years estimate) which we accessed from Social Explorer was joined to the spatial join data to calculate the populations by precinct and ethnic group. This data is stored in NYC_BlockGroups_Police_Precincts_Hispanic_Pop.csv.

This Jupyter notebook produces multiple output files. First, it produces stop_and_frisk_data_by_precinct.csv which has for each precinct all calculated stops and arrests for all four types of crimes as well as the populations by demographic. Then, for each type of crime, it produces a .csv file for the number of arrests in the previous year as well as the number of stops in the 15 month period by precinct and race. These files respectively have the format of 2014_arrests_*crime type*.csv and 20152016_stops_*crime type*.csv.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published