- Fetches real-time NYC traffic data using the NYC Open Data API (https://data.cityofnewyork.us/resource/7ym2-wayt.json) and Apache Spark.
- Performs data preprocessing, including handling missing data, renaming columns for consistency, and creating new time-related features:
- Day of the Week: Helps identify patterns like weekday vs. weekend traffic.
- Is Weekend: A binary feature indicating whether a given day is a weekend.
- Week of the Year: Captures traffic patterns over different weeks.
- Handles spatial data using geometry fields to extract coordinates if applicable.
- Correlation Heatmap
- Borough-Wise Traffic Volume Analysis
- Directional Traffic Volume Analysis
- Street-Wise Traffic Analysis
- Traffic Volume Over Time
- Peak Hours Analysis
- Top 10 Busiest Dates
-
Random Forest Regression for Traffic Volume Prediction
- Purpose: Predicts traffic volume based on features like hour, segment ID, day of the week, and month.
- Steps:
- Splits data into training and testing sets.
- Fits a Random Forest Regressor to the training data.
- Evaluates model performance using metrics like Mean Squared Error (MSE) and R-squared (R²).
- Visualization:
-
Traffic Volume Classification
- Purpose: Classifies traffic volume into categories (Low, Medium, High).
- Steps:
- Bins traffic volume into predefined ranges.
- Encodes categorical features (e.g., borough, direction).
- Trains a Random Forest Classifier to predict traffic categories.
- Evaluation: Classification report with precision, recall, and F1 score.
-
Peak Hour Classification
- Purpose: Identifies whether a given hour qualifies as a "peak hour."
- Steps:
- Labels hours as peak or non-peak based on traffic volume thresholds.
- Trains a classifier to predict peak hours based on traffic patterns.
-
Abnormal Traffic Detection
- Real-Time Updates: Automatically fetches and updates traffic data every 10 seconds.
- Interactive Visualizations:
- Traffic volume trends by street.
- Top streets by traffic volume.
- Hourly traffic volume distribution.
- Borough-wise traffic distribution (pie chart and bar chart).
- Geographical traffic map with markers.
- Street Selector: Filter visualizations by selecting a specific street.
-
Data Source:
The app fetches traffic data from the NYC Open Data API:
NYC Traffic Data API -
Data Fetching:
- Data is fetched in chunks using pagination (
$limit
,$offset
). - Filters are applied to fetch only traffic data from the year 2024.
- Data is fetched in chunks using pagination (
-
Data Processing:
- Converts fields such as
year
,month
,day
,hour
, andvolume
into numeric types. - Extracts latitude and longitude from geographic data (
wktgeom
). - Combines date and time components into a
datetime
column for time-series analysis.
- Converts fields such as
-
Real-Time Updates:
A background thread continuously fetches and processes new data.
- Traffic Volume Trend Line Chart
Displays traffic volume trends for a selected street over time. - Street Selector
A dropdown to select specific streets and filter the visualizations. - Top 5 Streets by Traffic Volume
A bar chart showing the streets with the highest traffic volume. - Hourly Traffic Volume
Bar chart showing traffic volume distribution across hours for the current day.
- Borough-Wise Traffic Volume (Pie and Bar Charts)
Visualizes traffic distribution across boroughs using pie and bar charts.
- Traffic Volume Map
An interactive map showing traffic volumes geographically with markers.
-
Data Fetching:
fetch_and_process_data()
fetches, processes, and structures the data.
-
Background Thread:
- A thread continuously updates the global dataset (
global_data
).
- A thread continuously updates the global dataset (
-
Visualization Updates:
update_graphs()
generates visualizations dynamically based on the selected street and updated data.
The app layout consists of the following Dash components:
dcc.Dropdown
: For street selection.dcc.Graph
: To render charts and maps.dcc.Interval
: For automatic updates.
-
Missing Data:
Rows without valid latitude or longitude are dropped. -
Performance:
Large datasets may slow down the app. Future improvements can include data sampling or caching. -
Error Handling:
Graceful fallback is implemented for empty or erroneous data.
-
Additional Visualizations:
- Traffic heatmaps for hourly trends.
- Weekly or monthly trend comparisons.
-
Predictive Modeling:
- Implement traffic volume predictions using machine learning.
-
Performance Optimization:
- Introduce caching mechanisms for API calls.
-
User Features:
- Allow users to filter by borough, date range, and traffic direction.