Washington Collisions is a tool for cleaning up, visualizing and analyzing Seattle collisions and neighborhoods data as well as weather data. In our example files we show visualizations using the Folium python package, which show how collisions vary by neighborhood under various weather and road surface conditions. We also investigate the question of whether or not speed limits changing in central Seattle in October 2016 had a significant impact on the number of collisions, the number of speeding related collisions or collision related injuries.
The project has the following structure:
wa_collisions/
|- README.md
|- wa_collisions/
|- __init__.py
|- neighborhood_reader.py
|- read_clean_integrate_data.py
|- render_stats.py
|- visualizer.py
|- data/
|- Collisions_test.csv
|- Collisions_With_Neighborhoods_test.csv
|- Weather_test.csv
|- Neighborhoods
|- Neighborhoods.json
|- WGS84
|- ...
|- tests/
|- __init__.py
|- test_neighborhood_reader.py
|- test_read_clean_integrate.py
|- test_render_stats.py
|- test_visualizer.py
|- examples/
|- Example - CausalImpact SpeedLimits.ipynb
|- Example - Prepare Data.ipynb
|- Example - Visualize Data.ipynb
|- doc/
|- Feature_Design_V1.md
|- Feature_Design_V2.md
|- Technology_Review_Presentation.pptx
|- datasets_V1.md
|- technology_review_outline.md
|- wa_collisions_final_presentation.pptx
|- _static/
|- ...
|- _images/
|- ...
|- .coverage
|- .coveragerc
|- .gitignore
|- .pylintrc
|- .travis.yml
|- LICENSE
|- requirements.txt
|- setup.py
- To install the package run the following:
- python setup.py install
- Then install the required dependancies:
- pip install -r requirements.txt
The collisions and neighborhood data were sourced from the Seattle Open Data. The weather data is from the Iowa State University database of hourly aiport AWOS/ASOS reports. Download instructions are available in Example - Prepare Data. The read_clean_integrate_data module works to prepare the data. The testing scripts use example data provided in wa_collisions/data. Users can use the read_clean_integrate_data module to process data from their own city. Although please note, this functionality has not been tested.
Since this is an interactive notebook, jupyter needs to be run with an additional parameter:
jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000000
Additionally, the interactive map visuals will only work in Mozilla Firefox.
Visualize the incidence of collisions around Seattle by neighborhood. The collisions can be visualized by categorical variables (road condition) or indicator variables (pedestrian involved, cyclist involved, fatality, etc.). Additionally, road conditions can be compared with weather data to answer questions like: "what is the incidence of ice related collisions when the weather is overcast?"
The number of collisions across Seattle can also be visualized over time. This allows the user to understand how the collisions changed across neighborhoods of Seattle.
We investigated whether there was an effect of changing speed limits on collision rate. The example notebook groups neighborhoods into those where speed limits changed in 2016 and those where they did not. We used Causal Impact package and Bayesian structural time series models to compare collisions and injuries “control” and “treatment.”
This project was developed during DATA 515 Software Design at the Unviersity of Washington. In addition to the project goals outlined below, the project aims to create a cohesive python module. During and after project development we welcome any feedback through the issues functionality on github.
- Clean and prepare Seattle area collision data for regression and other machine learning projects.
- Create visualizations to help users explore the Seattle area collision data.
- Advanced analysis of Seattle area collision data and the change in speed limits.
- The project can be installed through the setup.py, but it is not available for download via pip.
- The functionality has been tested extensively on Seattle data, but it has not been tested for any other cities.
Washington Collisions uses only open source software and is available for use and distribution under an MIT license.
Special thanks to Joe Hellerstein, Dave Beck and Dimitrios Gklezakos of the University of Washington for instructing us on effective software engineering for data science and research projects.
Also, thanks to the shablona team for providing us with a great template from which we built this repo.

