The Stanford Open Policing Project Summary Video can be seen below.
Quotes below from the Stanford Open Policing Project website:
"On a typical day in the United States, police officers make more than 50,000 traffic stops. Our team is gathering, analyzing, and releasing records from millions of traffic stops by law enforcement agencies across the country. Our goal is to help researchers, journalists, and policymakers investigate and improve interactions between police and the public.
Currently, a comprehensive, national repository detailing interactions between police and the public doesn’t exist. That’s why the Stanford Open Policing Project is collecting and standardizing data on vehicle and pedestrian stops from law enforcement departments across the country — and we’re making that information freely available. We’ve already gathered over 200 million records from dozens of state and local police departments across the country.
We, the Stanford Open Policing Project, are an interdisciplinary team of researchers and journalists at Stanford University. We are committed to combining the academic rigor of statistical analysis with the explanatory power of data journalism."
NBC News recently covered this dataset (March 13, 2019) here.
H/T to both DJ Patil and
Alex Chohlas-Wood @LX_CW
for making us aware of the dataset, and credit to the 15 people who helped contribute to collecting/cleaning/etc this data.
This Stanford Open Policing project data will not all be duplicated on our GitHub as there is an abundance of datasets, many of which are larger than the 100 MB upload size allowed by GitHub. If you put together ALL of the datasets there are over 200 MILLION stops. The data is presented as is, and some datasets are missing large chunks of data while many are close to complete. Datasets are separated by state and/or city, in both .csv
and .rds
format.
There are a LOT of datasets there and each one has a corresponding data dictionary here.
If that is a bit too much data to dig into, consider checking out the summary-level datasets here and the included figures from the group's recent arXiv paper. If you do use the summary data - please cite their working paper ( arXiv:1706.05678 ). They have been kind enough to include all the code, data, figures, and even a tutorial!
These are datasets from the working paper mentioned above - the parent folder with the full details can be found here. Go here to skip straight to the results folder to get all the specific .csv
files.
There are additional data files on their github, but a file was "created for convenience which combines data from all the main analyses in the paper".
combined_data <- readr::read_csv("https://raw.githubusercontent.com/5harad/openpolicing/master/results/data_for_figures/combined_data.csv")
No cleaning scripts this week, the summary level data is in great shape!
For the Summary-level datasets - there are a few data-dictionaries, you can find them here. These can help with conversion of county or district codes to more meaningful data.
variable | class | description |
---|---|---|
location | character | County/District location for each incidence |
state | character | State for each incidence |
driver_race | character | Driver's race |
stops_per_year | double | Number of stops per year |
stop_rate | double | Stop rate (stop = police stop of a vehicle) (%) |
search_rate | double | Search rate (%) |
consent_search_rate | double | Consent to search rate (%) |
arrest_rate | double | Arrest rate (%) |
citation_rate_speeding_stops | double | Citation rate for speeding stops (%) |
hit_rate | double | Hit rate (%): the proportion of searches that successfully turn up contraband |
inferred_threshold | double | Inferred threshold - based off the threshold test - please see section 4.2 of the paper. |