Video - https://www.youtube.com/watch?v=GsYtVGVTPWM
GRIME is a multi-parameter optimization system that identifies optimal locations for deploying trash interception barriers ("nets") on urban waterways. It evaluates candidate sites across 28 geospatial parameters organized into 6 parameter families, producing 4 sub-scores that combine into a single composite ranking per candidate location.
Built for: 2026 SmathHacks hackathon
What makes it technically interesting:
- A two-level weighted scoring architecture (parameters → sub-scores → composite) that is both interpretable and tunable
- Hydrological pipeline built on real DEM data: pit-filling → depression-filling → flat resolution → D8 flow direction → flow accumulation → stream extraction
- Manning's equation applied to estimate flow velocity from DEM slope and channel geometry, with feasibility gates that eliminate sites where deployment is physically impossible
- Monte Carlo sensitivity analysis via Dirichlet-perturbed weight vectors to assess ranking robustness
- Three-phase candidate placement algorithm: spatial constraint satisfaction → full-parameter scoring → population-scaled risk-percentile filtering
- On-demand real waterway geometry from OpenStreetMap's Overpass API for 108,772 cities across 240 countries
Scope: Site selection modeling and scoring. GRIME does not design the physical trap, predict trash composition, or model individual debris trajectories.
Non-goals: Real-time sensor integration, computer vision trash classification, trap mechanical design, economic cost optimization.
All surfaces generated in the Wolfram Language from real USGS elevation data and GRIME's scoring functions. See full documentation for derivations and code.
3D elevation surface from USGS 3DEP at 10m resolution. Stream valleys visible as low-elevation grooves define where GRIME extracts candidate net sites.
The final composite score as a function of Generation (trash input) and Impact (downstream consequence), with Flow and Feasibility held constant. The diagonal ridge shows that high scores require both trash presence and downstream consequence — neither alone is sufficient.
Deployment feasibility as a function of channel width and flow velocity. The green plateau marks the sweet spot: narrow, moderate-velocity channels where nets can be spanned and anchored. Red zones are eliminated by hard gates.
Flow velocity estimated from Manning's equation across slope and roughness parameter space. Steep, smooth channels (high slope, low roughness) produce dangerous velocities; flat, rough channels produce stagnant conditions.
Environmental justice burden across a synthetic metro region. Peaks identify overburdened communities where trash interception has the highest equity value — GRIME weights these areas higher in the Impact sub-score.
500 weight vectors sampled from a Dirichlet distribution (κ=10) projected onto a ternary diagram. The red dot is the baseline. GRIME recomputes rankings under each perturbation to test whether top-ranked sites are robust to weight assumptions.
Urban waterways accumulate trash from stormwater runoff, illegal dumping, combined sewer overflows, and bridge crossings. Deploying interception devices (nets, booms, trash traps) requires choosing locations that maximize debris captured per device while remaining physically feasible to install and maintain.
A naive approach — placing traps at the largest rivers — fails because:
- Large rivers are too wide (>30m) for stationary nets; debris passes around or damages the device
- High-traffic waterways (navigable canals, shipping channels) cannot be obstructed
- Upstream tributaries with high impervious surface and population density generate more trash per unit area than rural mainstems
- Downstream impact varies: a trap upstream of a drinking water intake has orders of magnitude more public health value than one upstream of an industrial canal
- Physical feasibility (road access, bank slope, flow velocity, land ownership) eliminates many otherwise-optimal locations
- All data sources must be free and require no API keys (census, EPA, USGS, OSM)
- The model assumes stationary barrier-style traps deployable in channels ≤30m wide (~100 ft)
- Scoring weights are set by informed heuristic and literature, not supervised learning (no ground-truth dataset of "correct" trap placements exists at scale)
- The system must produce results in <15 seconds per city for interactive demo use
graph TD
A[DEM Raster - USGS 3DEP] --> B[Hydrological Conditioning - pysheds]
B --> C[Flow Direction - D8 Algorithm]
C --> D[Flow Accumulation]
D --> E[Stream Network Extraction]
E --> F[Candidate Site Generation]
G[EPA APIs - TRI ECHO EJSCREEN SDWIS FRS] --> H[Parameter Computation - 28 params x 6 families]
I[USGS APIs - NWIS StreamStats PAD-US] --> H
J[Census APIs - ACS TIGER] --> H
K[OSM - Overpass API] --> H
F --> H
H --> L[Hard Gate Filtering]
L --> M[MinMax Normalization]
M --> N[Weighted Sub-scores x4]
N --> O[Composite Score]
O --> P[Risk-Percentile Filtering]
P --> Q[Ranked Deployment Sites]
Q --> R[FastAPI Backend]
R --> S[Mapbox GL JS Dashboard]
K --> S
| Subsystem | Location | Purpose |
|---|---|---|
| DEM Pipeline | core/pipeline.py |
Fetch elevation data, extract stream network, generate candidate points |
| Generation Params | core/generation.py |
Compute trash generation indicators (population, land use, industrial sources) |
| Flow Params | core/flow.py |
Compute hydraulic transport parameters (discharge, velocity, flood frequency) |
| Impact Params | core/impact.py |
Compute downstream consequence indicators (drinking water, EJ, protected areas) |
| Feasibility Params | core/feasibility.py |
Compute deployment constraint parameters (road access, channel width, slope) |
| Scoring Engine | core/scoring.py |
Normalize, weight, composite, sensitivity analysis |
| API Server | api/main.py |
REST + WebSocket endpoints serving scored GeoJSON |
| Dashboard | dashboard/index.html |
Interactive map with on-demand OSM waterway fetching and client-side scoring |
| Places Database | mock_data/places.json |
108,772 city/town records across 240 countries (7MB compact JSON) |
Two execution modes exist:
Mode 1 — Full Python pipeline (research/validation): DEM fetch → pysheds hydrology → stream extraction → candidate generation → API-based parameter computation → composite scoring → GeoJSON output
Mode 2 — Dashboard on-demand (demo/interactive): User clicks city → Overpass API returns real waterway geometry → client-side JS generates candidate positions with spatial constraints → client-side scoring using simplified parameter model → Mapbox GL renders results
Mode 2 exists because Mode 1 takes 3–5 minutes per watershed and requires installing pysheds (which has C dependencies that fail on some Windows machines). Mode 2 runs in <5 seconds anywhere with a browser.
The 28 raw parameters are not directly comparable (population density in persons/km² vs flow velocity in m/s vs a binary land ownership flag). The system handles this through two-level aggregation:
-
Parameter level: Each raw parameter is MinMax-normalized to [0, 1] independently within the candidate set, then multiplied by its within-family weight. The weighted sum produces a sub-score in [0, 100].
-
Sub-score level: The four sub-scores are combined via a second set of weights into the composite score in [0, 100].
This two-level structure has a specific advantage: it makes the model interpretable at the sub-score level. A judge or engineer can look at a candidate and immediately see "high generation, low feasibility" without needing to parse 28 individual numbers.
Some parameters act as binary disqualifiers rather than continuous scores. A channel wider than 50m cannot hold a net regardless of how much trash flows through it. These are implemented as hard gates that remove candidates before scoring, separate from the soft scoring that ranks survivors:
| Gate | Condition | Rationale |
|---|---|---|
| Velocity | V > 3.0 m/s | Trap will be damaged or torn loose |
| Channel width | W > 50m or W < 0.5m | Too wide to span or too narrow for meaningful accumulation |
| Land ownership | Confirmed private, no permission | Legal barrier to deployment |
Candidate placement is not random scatter. It is a three-phase algorithm:
- Constraint satisfaction: Generate all positions that pass spatial, width, and traffic constraints
- Full scoring: Evaluate every surviving position on the composite model
- Risk-percentile selection: Keep only the top N% by score, where N scales with city population
This separates "can we physically put a net here?" (phase 1) from "should we?" (phases 2–3).
A city of 20 million people needs more nets than a town of 10,000, but not linearly more. The risk percentile threshold scales in steps:
| Population | Percentile kept | Rationale |
|---|---|---|
| >10M | Top 35% | Mega-cities have extensive waterway networks; more sites are genuinely high-risk |
| >1M | Top 30% | Large cities still have substantial catchments |
| >100K | Top 25% | Mid-size cities, moderate network complexity |
| <100K | Top 20% | Small towns, fewer waterways, tighter selection |
A minimum floor of 5 deployed sites ensures the model always produces enough output to demonstrate ranking behavior.
Let x ∈ ℝ²⁸ be the raw parameter vector for a candidate site. The composite score S(x) is:
S(x) = Σ(k=1..4) ωk · Gk(x)
where ω = [0.30, 0.25, 0.30, 0.15] are the sub-score weights and each sub-score Gk is:
Gk(x) = 100 · Σ(j ∈ Fk) wj · x̂j
where Fk is the set of parameter indices belonging to family k, wj is the within-family weight for parameter j (renormalized to sum to 1 after filtering unavailable parameters), and x̂j is the MinMax-normalized value:
x̂j = (xj - min(xj)) / (max(xj) - min(xj))
For distance-based parameters where lower is better (estuary distance, beach distance), the normalization is inverted: x̂j = 1 − x̂j.
Important implementation detail: Normalization is computed across the candidate set, not against a global reference. This means scores are relative rankings, not absolute measures. A score of 80 means "top of this candidate pool," not "80% of some theoretical maximum."
Flow velocity at a candidate site is estimated via Manning's equation:
V = (1/n) · R^(2/3) · S^(1/2)
where:
- n is Manning's roughness coefficient (dimensionless), selected by channel type:
- Clean straight: 0.030
- Winding with pools (typical urban creek): 0.040
- Sluggish, weedy: 0.070
- Urban concrete-lined: 0.015
- (Source: Chow, V.T., 1959, Open-Channel Hydraulics)
- R is the hydraulic radius (m) = A_cross / P_wetted, approximated as rectangular channel: R = (W × D) / (W + 2D), where depth D ≈ 0.3W (bankfull approximation)
- S is the channel slope (dimensionless), computed from DEM as elevation difference over a 100m reach: S = (Z_here − Z_downstream) / 100, clamped to minimum 0.0001
A continuity cross-check is performed: V_continuity = Q / A_cross, where Q is the USGS-measured discharge converted to m³/s. The final velocity estimate is the geometric mean of Manning's and continuity estimates:
V_final = sqrt(V_Manning · V_continuity)
This hedges against errors in either the DEM slope (which can be noisy at 10m resolution) or the channel geometry assumption (rectangular approximation).
The velocity feasibility score is a piecewise function mapping velocity to a deployment viability multiplier:
f(V) =
0.3 if V < 0.05 m/s (stagnant — debris doesn't concentrate)
0.7 if 0.05 ≤ V < 0.30 (slow but workable)
1.0 if 0.30 ≤ V ≤ 1.50 (optimal interception range)
0.5 if 1.50 < V ≤ 2.50 (fast — heavy anchoring needed)
0.1 if V > 2.50 (too fast — trap damage likely)
Sites with V > 3.0 m/s are removed entirely by the hard gate before scoring.
The rational method runoff coefficient C is estimated from impervious surface percentage via a linear model:
C = 0.05 + 0.009 · I
where I is the NLCD impervious surface percentage [0, 100]. This yields C ∈ [0.05, 0.95], ranging from forest (≈5% runoff) to fully paved (≈95% runoff). This is the same linearization used in the WaterGate methodology.
Several parameters (CSO proximity, Superfund proximity) use an inverse-distance kernel to compute influence from point sources:
score = Σ(i=1..N) 1 / (1 + (di / h)²)
where di is the Euclidean distance (in UTM meters) from the candidate to source i, and h is the half-decay distance (500m default). This is a Cauchy kernel that gives full weight at distance 0 and half weight at distance h.
Proximity to downstream drinking water intakes uses an exponential decay:
score = Σ(i=1..N) exp(-di / 10)
where di is the distance in km. Intakes within 10km get weight ≈0.37, within 5km ≈0.61, within 1km ≈0.90. Intakes beyond 50km are ignored.
To assess whether the top-ranked sites are robust to weight uncertainty, the system performs Monte Carlo sensitivity analysis:
- Sample N = 50 perturbed weight vectors from a Dirichlet distribution: ω' ~ Dir(α), where α = 10 × [0.30, 0.25, 0.30, 0.15]
- The α scaling factor (×10) controls perturbation magnitude — higher α concentrates samples closer to the baseline weights
- For each perturbed weight vector, recompute composite scores and record which sites appear in the top 5
- The robustness percentage for each site is: (count of times in top 5) / N × 100%
A site with robustness > 80% is ranked highly regardless of reasonable weight changes. A site at 30% is sensitive to weight assumptions.
The minimum-spacing constraint uses the haversine formula for geodesic distance:
a = sin²(Δφ/2) + cos(φ₁) · cos(φ₂) · sin²(Δλ/2)
d = 2R · atan2(√a, √(1−a))
where R = 6,371,000 m. This is used instead of Euclidean distance because the candidate set can span several kilometers, where flat-earth approximation introduces meaningful error at high latitudes.
The EJ priority score combines three EPA EJSCREEN percentiles:
EJ = (0.4 · P_discharge + 0.3 · P_minority + 0.3 · P_income) / 100
where P_discharge is the wastewater discharge EJ percentile, P_minority is the minority population percentile, and P_income is the low-income percentile. All are [0, 100] percentiles from EJSCREEN. The result is [0, 1].
Purpose: Convert raw elevation data into a hydrologically consistent surface from which flow direction and stream networks can be extracted.
Input: DEM raster from USGS 3DEP (10m resolution)
Output: Flow direction grid, flow accumulation grid, stream network GeoJSON
Steps:
FUNCTION condition_dem(dem):
pit_filled ← fill_pits(dem) // remove single-cell sinks
flooded ← fill_depressions(pit_filled) // fill multi-cell depressions
inflated ← resolve_flats(flooded) // assign gradient to flat areas
flow_dir ← D8_flowdir(inflated) // each cell → 1 of 8 neighbors
accumulation ← flow_accumulation(flow_dir) // count upstream cells per cell
RETURN flow_dir, accumulation
Stream extraction: Cells where accumulation exceeds a threshold (default 500 cells = 500 × 10m × 10m = 0.05 km²) are classified as stream cells. Connected stream cells are vectorized into LineString geometries.
Complexity: O(n) for each step where n is the number of DEM cells. For the Ellerbe Creek bbox at 10m resolution: approximately 3000 × 1500 = 4.5M cells. Total conditioning time: ~30–60 seconds.
Why pysheds: It operates entirely in-memory on NumPy arrays without requiring ArcGIS or GRASS GIS. The D8 algorithm assigns each cell exactly one of 8 cardinal/diagonal flow directions based on steepest descent, which is the standard approach for stream extraction in computational hydrology.
Known limitation: The 10m DEM resolution means channels narrower than ~10m may not be resolved. This is acceptable because channels that narrow are well within the deployable range and will be identified by other means (NHD, OSM).
Purpose: Given a set of waterway geometries and city metadata, produce a set of spatially valid, risk-ranked candidate sites for trap deployment.
Input: Array of stream geometries (from Overpass API), city population, country code
Output: Ranked array of candidate objects with scores and parameters
FUNCTION generate_candidates(streams, pop, country):
// ── Phase 1: Constraint satisfaction ──
MIN_SPACE ← 120m
MAX_WIDTH ← 30m
placed ← []
valid_positions ← []
FOR EACH stream IN streams:
IF stream.width > MAX_WIDTH: CONTINUE
cap ← traffic_capacity(stream)
count_on_stream ← 0
dist_since_last ← MIN_SPACE
FOR EACH point IN stream.coords:
dist_since_last += haversine(previous_point, point)
IF dist_since_last < MIN_SPACE: CONTINUE
IF any p in placed where haversine(p, point) < MIN_SPACE: CONTINUE
IF count_on_stream >= cap: CONTINUE
placed.add(point)
count_on_stream++
dist_since_last ← 0
valid_positions.add(point with metadata)
// ── Phase 2: Score every valid position ──
scored ← []
FOR EACH pos IN valid_positions:
compute 28 parameters (simplified model)
compute 4 sub-scores
compute composite
scored.add(pos with scores)
// ── Phase 3: Risk-percentile selection ──
scored.sort_by(composite, descending)
pctile ← population_scaled_percentile(pop)
cutoff ← max(5, ceil(len(scored) * pctile))
RETURN scored[0:cutoff]
Complexity: Phase 1 is O(n × m) where n is total coordinate points and m is placed candidates. Phase 2 is O(k) where k is valid positions. Phase 3 is O(k log k) for sorting. In practice k < 200 and the entire function runs in <100ms.
FUNCTION sensitivity_analysis(candidates, n_perturbations=50):
baseline ← compute_composite_score(candidates)
top5_counts ← zeros(len(candidates))
REPEAT n_perturbations TIMES:
α ← [3.0, 2.5, 3.0, 1.5]
ω' ← sample_dirichlet(α)
composite' ← ω'[0]·gen + ω'[1]·flow + ω'[2]·impact + ω'[3]·feas
top5 ← indices of 5 highest composite'
top5_counts[top5] += 1
robustness ← top5_counts / n_perturbations × 100
RETURN baseline with robustness column
When ground-truth trap locations are available, weights can be optimized via scikit-optimize:
FUNCTION optimize_weights(candidates, known_good_sites):
FUNCTION objective(weights):
w ← normalize(weights)
scored ← recompute with w
penalty ← sum of ranks of known_good_sites in scored
RETURN penalty
search_space ← [Real(0.05, 0.60)] × 4
result ← gp_minimize(objective, search_space, n_calls=50)
RETURN normalize(result.x)
Status: Implemented in core/scoring.py but not yet run against real ground-truth data.
Decision: Aggregate 28 parameters into 4 sub-scores, then combine sub-scores into a composite.
Context: A flat 28-weight sum is opaque — changing one weight has a non-obvious effect.
Alternatives considered: (1) Flat weighted sum. (2) PCA dimensionality reduction. (3) Random forest classifier.
Chosen approach: Two-level weighted sum. Sub-scores map to real questions ("how much trash?", "how does it move?", "does it matter?", "can we deploy?").
Consequences: Interpretable and tunable, but assumes linear parameter contributions within each family. Non-linear interactions are not captured.
Decision: Use MinMax scaling to [0, 1] per parameter across the candidate set.
Alternatives considered: Z-score, percentile rank, log-transform + MinMax.
Chosen approach: MinMax. Simple, bounded, interpretable.
Consequences: Outliers dominate. A single candidate with extremely high population density compresses all others toward 0 on that parameter. Acknowledged limitation.
Decision: The dashboard computes scores in JavaScript, not by calling the Python backend.
Context: pysheds/rasterio have C dependencies that fail on Windows. Dashboard must work by opening one HTML file.
Chosen approach: Simplified JS scoring model that parallels the Python implementation but uses OSM-estimated widths and seeded random parameter generation rather than real API data.
Consequences: Dashboard scores are approximate. Python pipeline is the authoritative scoring implementation.
Decision: Fetch real waterway geometry from the Overpass API on each city click.
Alternatives considered: Pre-generated GeoJSON per city (storage), procedural random-walk rivers (alignment), NHD (US-only).
Chosen approach: Overpass API with 12s timeout and procedural fallback.
Consequences: Requires internet. Coverage varies globally. Fallback doesn't align with terrain.
{
"type": "Feature",
"geometry": {"type": "Point", "coordinates": [-78.898, 35.994]},
"properties": {
"id": 0,
"city": "durham",
"city_name": "Durham, NC",
"stream_name": "Ellerbe Creek",
"composite_score": 47.08,
"generation_score": 42.5,
"flow_score": 38.1,
"impact_score": 35.2,
"feasibility_score": 82.0,
"population_density": 1424.3,
"impervious_pct": 51.8,
"usgs_mean_q_cfs": 37.0,
"flow_velocity_ms": 1.377,
"strahler_order": 5,
"catchment_area_km2": 63.5,
"channel_width_m": 13.1,
"ej_index": 0.595,
"road_access_m": 366.4,
"bank_slope_deg": 19.6,
"robustness_pct": 72.6,
"rank": 1
}
}{"n":"Durham","c":"US","p":278993,"la":35.994,"lo":-78.8986}Fields: n=name, c=ISO country code, p=population, la=latitude, lo=longitude.
| # | Parameter | Unit | Family | Default Weight | Data Source |
|---|---|---|---|---|---|
| 1 | Population density | persons/km² | Generation | 0.18 | US Census ACS |
| 2 | Impervious surface % | % | Generation | 0.20 | NLCD 2021 |
| 3 | Road density | km/km² | Generation | 0.10 | Census TIGER / OSMnx |
| 4 | EPA TRI facility count | facilities/km² | Generation | 0.18 | EPA TRI API |
| 5 | NPDES discharge points | count | Generation | 0.12 | EPA ECHO API |
| 6 | CSO/storm outfall density | points/km² | Generation | 0.12 | EPA ECHO |
| 7 | Litter complaint density | reports/km² | Generation | 0.10 | Durham 311 / local GIS |
| 8 | USGS mean discharge Q | cfs | Flow | 0.22 | USGS NWIS |
| 9 | Flow velocity | m/s | Flow | 0.16 | Manning's eq from DEM |
| 10 | Strahler stream order | ordinal | Flow | 0.14 | Computed from topology |
| 11 | Catchment area A | km² | Flow | 0.18 | pysheds DEM analysis |
| 12 | Flood return period Q10 | cfs | Flow | 0.14 | USGS StreamStats |
| 13 | Seasonal flow variability | CV | Flow | 0.10 | USGS NWIS annual stats |
| 14 | Runoff coefficient C | dimensionless | Flow | 0.06 | NLCD k-means |
| 15 | Drinking water intake proximity | exp(-d/10) | Impact | 0.22 | EPA SDWIS / ECHO |
| 16 | Protected area proximity | score | Impact | 0.16 | USGS PAD-US |
| 17 | Environmental justice index | [0,1] | Impact | 0.18 | EPA EJSCREEN |
| 18 | Ocean/estuary proximity | km (inverted) | Impact | 0.14 | NHD terminus |
| 19 | Recreational beach proximity | km (inverted) | Impact | 0.12 | EPA BEACH Program |
| 20 | Tourism/recreation value | amenity count | Impact | 0.10 | OSM amenity density |
| 21 | Superfund site proximity | score | Impact | 0.08 | EPA FRS/CERCLIS |
| 22 | Road access distance | m | Feasibility | 0.25 | OSMnx routing |
| 23 | Channel width | m | Feasibility | 0.20 | NHD VAA + NBI span |
| 24 | Flow velocity (penalty) | m/s | Feasibility | 0.20 | Manning's eq |
| 25 | Land ownership | binary | Feasibility | 0.15 | USGS PAD-US |
| 26 | Bank slope stability | degrees | Feasibility | 0.10 | DEM gradient |
| 27 | Bridge/structure proximity | bonus | Feasibility | 0.10 | FHWA NBI |
grime/
├── core/ # Python scoring pipeline (core deliverable)
│ ├── __init__.py # Constants, safe_call(), helpers
│ ├── pipeline.py # DEM → pysheds → stream extraction → candidates
│ ├── generation.py # 7 trash generation parameters + API integrations
│ ├── flow.py # 7 flow parameters, Manning's equation, USGS data
│ ├── impact.py # 7 downstream impact parameters, EJ scoring
│ ├── feasibility.py # 6 deployment feasibility parameters, hard gates
│ └── scoring.py # Normalization, weighting, composite, sensitivity
├── api/
│ └── main.py # FastAPI: REST + WebSocket + static serving
├── dashboard/
│ └── index.html # Mapbox map, Overpass integration, client-side scoring
├── mock_data/
│ └── places.json # 108,772 cities, 240 countries (7MB)
├── scripts/
│ └── generate_mock.py # Builds places.json from geonamescache
├── notebooks/
│ └── validate_pipeline.ipynb # Pipeline validation notebook
├── requirements.txt
├── start.sh
└── README.md
Where critical logic lives:
| Logic | File | Function |
|---|---|---|
| Composite scoring formula | core/scoring.py |
compute_composite_score() |
| Manning's velocity | core/flow.py |
compute_flow_velocity() |
| Hard gate filtering | core/scoring.py |
apply_hard_gates() |
| Sensitivity analysis | core/scoring.py |
sensitivity_analysis() |
| Client-side placement | dashboard/index.html |
generateCandidates() |
| Overpass waterway fetch | dashboard/index.html |
fetchRealStreams() |
| Sub-score normalization | core/scoring.py |
compute_subscore() |
- User runs:
python -m core.pipeline --bbox "-79.05,35.90,-78.75,36.05" py3dep.get_map('DEM', bbox, resolution=10)fetches 3DEP raster- pysheds: fill_pits → fill_depressions → resolve_flats → flowdir → accumulation
extract_river_network(threshold=500)→ stream GeoJSONgenerate_candidates(spacing=200m)→ candidate points along streams- For each candidate: compute pixel coords, elevation, catchment area from DEM
- Output:
mock_data/candidates.geojson
- Browser opens
dashboard/index.html - Fetches
places.json(7MB) → parses 108,772 cities - Mapbox GL JS renders clustered city markers
- User clicks city →
openCity(idx)fires - POST to Overpass API → returns waterway geometry as JSON
fetchRealStreams()filters: tidal=no, width≤30m, top 15–25 by lengthgenerateCandidates()runs 3-phase placement + scoring- Mapbox renders: stream lines (cyan) + candidate dots (color-coded by score)
- User clicks candidate → sidebar shows score breakdown + parameters
| Method | Path | Parameters | Response |
|---|---|---|---|
| GET | /api/candidates |
?min_score=N ?top_n=N ?subscore=field |
GeoJSON FeatureCollection |
| GET | /api/candidates/{id} |
— | Score breakdown with 4 sub-score parameter trees |
| GET | /api/weights |
— | All parameter and sub-score weights |
| GET | /api/stats |
— | Count, score range, mean, top 5 |
| GET | /map |
— | Serves dashboard HTML |
| WS | /ws |
— | Real-time candidate updates |
| API | Auth | Rate Limit | Timeout | Fallback |
|---|---|---|---|---|
| Overpass API | None | Informal | 12s | Procedural generation |
| USGS NWIS | None | None published | 30s | Hardcoded Ellerbe Creek stats |
| EPA ECHO | None | None published | 30s | Empty GeoDataFrame |
| EPA EJSCREEN | None | None published | 30s | 0.5 (neutral) |
| Census ACS | None | None published | 30s | Durham average (500/km²) |
| USGS 3DEP | None | None published | 60s | Fatal — no fallback |
Every external API call is wrapped in safe_call() with a default fallback value.
| Setting | Location | Default | Notes |
|---|---|---|---|
MAPBOX_TOKEN |
dashboard/config.js (gitignored) and .env |
Placeholder | Must replace — free at mapbox.com. Copy dashboard/config.example.js → dashboard/config.js, and .env.example → .env |
ELLERBE_BBOX |
core/__init__.py |
(-79.05, 35.90, -78.75, 36.05) |
Ellerbe Creek watershed |
ELLERBE_GAUGE |
core/__init__.py |
"02086849" |
USGS gauge site number |
UTM_CRS |
core/__init__.py |
"EPSG:32617" |
UTM zone 17N (Durham, NC) |
| DEM resolution | core/pipeline.py |
10m | Passed to py3dep |
| Accumulation threshold | core/pipeline.py |
500 cells | Stream extraction sensitivity |
| Candidate spacing | core/pipeline.py |
200m | Along-stream distance |
| Composite weights | core/scoring.py |
[0.30, 0.25, 0.30, 0.15] | Gen, Flow, Impact, Feas |
| Min spacing (dashboard) | dashboard/index.html |
120m | Haversine between nets |
| Max width (dashboard) | dashboard/index.html |
30m | Skip channels wider |
| Risk percentile | dashboard/index.html |
20–35% | Population-scaled |
| Min deploy floor | dashboard/index.html |
5 | Always at least this many |
macOS only ships python3 (no bare python). The cleanest setup is a venv, which gives you a local python + pip that won't collide with system Python or other projects.
cd ~/Downloads/GRIME
# 1. Create + activate the venv (you'll see (.venv) in your prompt after activation)
python3 -m venv .venv
source .venv/bin/activate
# 2. Install web-server deps (FastAPI, uvicorn, websockets, etc.)
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
# 3. Configure secrets (both files are gitignored)
cp dashboard/config.example.js dashboard/config.js # then paste your Mapbox pk.* token
cp .env.example .env # then set MAPBOX_TOKEN=...
# 4. Generate the places database (one-time)
python scripts/generate_mock.py
# 5. Start the API (this also serves the dashboard)
python -m uvicorn api.main:app --reload --port 8000Open http://localhost:8000/ — landing page. /explore for the map app. /api/swagger for API docs.
Once setup is done, every subsequent run is just two commands:
cd ~/Downloads/GRIME
source .venv/bin/activate
python -m uvicorn api.main:app --reload --port 8000If you'd rather skip the venv, substitute python3 and python3 -m pip everywhere:
python3 -m pip install -r requirements.txt
python3 -m uvicorn api.main:app --reload --port 8000For the full Python pipeline (rasterio, fiona, pysheds), use conda — pip wheels for these often fail on Windows:
conda install -c conda-forge rasterio fiona geopandas pysheds
pip install -r requirements-full.txt| Error | Cause | Fix |
|---|---|---|
command not found: python |
macOS only has python3 |
Activate the venv, or use python3 |
No module named uvicorn |
Deps not installed in the active Python | python -m pip install -r requirements.txt (note python -m pip, not bare pip) |
Mapbox token not configured (500 from /api/config) |
.env missing or MAPBOX_TOKEN= empty |
cp .env.example .env and fill in the token |
| Port 8000 in use | Old uvicorn still running | lsof -i :8000 to find PID, or pass --port 8001 |
| Empty candidates / blank map | Mock data not generated | python scripts/generate_mock.py |
- Open map → 108,772 cities visible as clustered dots
- Search "Durham" → click → fly to Durham, NC
- Wait 3s → real Ellerbe Creek geometry appears with scored candidate dots
- Click top-ranked site → score breakdown panel shows 4 sub-scores
- Toggle "Waterways" → OSM overlay confirms alignment
- Toggle "Light theme" → clean beige presentation mode
- Explain the 28-parameter model and show weight sensitivity
python -m core.pipeline --bbox "-79.05,35.90,-78.75,36.05" --resolution 10 --threshold 500curl http://localhost:8000/api/candidates?top_n=10
curl http://localhost:8000/api/candidates/3
curl http://localhost:8000/api/weights| Operation | Time | Bottleneck |
|---|---|---|
| Dashboard initial load | ~3s | Parsing 7MB places.json |
| Overpass API query | 2–8s | Network + OSM server |
| Client-side placement + scoring | <100ms | Haversine collision checks |
| Python DEM pipeline (full) | 30–90s | py3dep HTTP fetch + pysheds |
| Python parameter computation | 3–5 min | Sequential EPA/USGS API calls |
No formal benchmarks have been run. Times above are observed during development.
| Failure | Impact | Mitigation |
|---|---|---|
| Overpass API down/slow | No real waterway data | Procedural fallback auto-activates |
| EPA/USGS API timeout | Missing parameter values | safe_call() returns fallback defaults |
| Mapbox token missing | Map blank | Fatal for dashboard; replace token |
| pysheds DEM fetch fails | No stream network | Fatal for Python pipeline; mock data works |
| MinMax with identical values | Division by zero | Returns 0.5 for all candidates |
- No authentication on the API — local development only
- Mapbox token is a publishable (
pk.*) client-side token. It lives indashboard/config.js(gitignored) and in.env(also gitignored). Usedashboard/config.example.jsand.env.exampleas templates. Restrict the token by URL in the Mapbox dashboard before deploying anywhere public. - Overpass queries use numeric interpolation only — no injection risk
- places.json contains only public geographic data, no PII
- All external APIs are read-only and keyless
Current state: No automated tests. Validation is:
notebooks/validate_pipeline.ipynb— manual pipeline verification- Visual inspection of stream networks in geojson.io
- Dashboard visual QA — verify rivers align with satellite
Recommended if continuing: Unit tests for compute_subscore(), Manning's velocity, hard gates. Property test: composite always in [0, 100].
- Scores are relative, not absolute. MinMax normalization means scores across cities are not comparable.
- Client-side scoring is approximate. Dashboard uses heuristics; Python pipeline uses real API data.
- Weight values are heuristic. Not optimized against ground truth. Bayesian scaffold exists but hasn't been run.
- OSM coverage varies. Excellent in US/Europe, variable in developing nations.
- Channel width is estimated. <5% of OSM waterways have width tags. Estimated from type heuristic.
- No temporal modeling. Scores are static snapshots, not seasonal.
- US-centric data sources for the Python pipeline. Dashboard bypasses this with population-scaled heuristics globally.
- 120m spacing threshold is arbitrary — tuned by inspection, not engineering spec.
- Ground-truth validation against actual trap deployment locations (Durham Stormwater Services)
- Bayesian weight optimization with real data
- Temporal scoring with USGS real-time discharge
- Computer vision trash detection from satellite/drone imagery
- Multi-city normalized scoring for global prioritization
- Cost modeling (deployment cost, maintenance frequency)
- Research publication: extend WaterGate paper, contact Wolfram Research authors
- University collaboration: Duke Environmental Engineering
- Chow, V.T. (1959). Open-Channel Hydraulics. McGraw-Hill.
- Leopold, L.B. (1964). Fluvial Processes in Geomorphology.
- Manning, R. (1891). On the flow of water in open channels and pipes.
| Term | Definition |
|---|---|
| Candidate site | A point on a waterway evaluated for trap deployment |
| Composite score | Final weighted combination of 4 sub-scores (0–100) |
| CSO | Combined Sewer Overflow |
| D8 | Deterministic 8-direction flow algorithm |
| DEM | Digital Elevation Model |
| EJ | Environmental Justice |
| Hard gate | Binary feasibility check that eliminates a candidate |
| NPDES | National Pollutant Discharge Elimination System |
| NBI | National Bridge Inventory |
| Strahler order | Stream classification: order 1 = headwater, higher = larger |
| Sub-score | Intermediate score (0–100) for one parameter family |
| TRI | Toxic Release Inventory |
MIT.
GRIME · 28 parameters · 6 families · 108,772 cities · 240 countries





