Reservoir Alignment: Multi-Context Reconciliation

Authors: Mike Johnson + Lynker Spatial Team

View the dataset here

Narrative

Accurate reservoir locations are essential for hydrologic modeling because reservoirs alter the natural flow regime by storing, releasing, and redistributing water across space and time. These operations directly influence downstream streamflow, flood peaks, drought severity, water availability, and ecosystem conditions. Today’s NWM only accounts for ~500 reservoirs across CONUS, which is incomplete for many forecasting and planning applications. To extend the scope of reservoir locations, data from other resources is needed.

The National Inventory of Dams (NID) provides broad coverage but variable location quality (on-reservoir, on-flowline, generalized, sometimes wrong). Even small positional errors can disconnect a dam/reservoir to the wrong flowline or waterbody, degrading routing of inflows/outflows and reducing model skill for discharge, storage, and evapotranspiration—undermining flood forecasting, drought planning, and environmental flow assessments.

Other datasets often have better locations but are incomplete or inconsistent in other ways particularly with spatial coverage. Critically, each dataset also opens doors for data assimilation, parameterization, and ML training on historic time series. By grounding our reference reservoirs with precise geographic contexts and aligning to a shared hydrographic fabric, we get regulated flow representation that better reflects the coupled human–natural water cycle and is a asset for community efforts like those at geoconnex and as NOAA/NWS POIs in the NWM.

Our goal is to build a harmonized set of reference reservoirs (proxied by dams) that are geospatially consistent with the hydrofabric used in USGS and NOAA/NWS modeling. We treat NID as the global set to validate and enrich, assign stable synthetic IDs (dam_id = "ls-*"), and use multiple contexts to correct locations and enhance attributes.

Strategy (evidence aggregation):

build candidate pairs via spatial proximity within tuned per-context radii,
compute name similarity (Jaro–Winkler) from cleaned strings
rank contexts by reliability and derived evidence,
select a best realization per dam, with diagnostics.

Per-dam output: A chosen realization (context + ID), snap distance (m), name similarity, number of supporting contexts, and offset from the original NID point.

Inputs

NID (cleaned, EPSG:5070, synthetic IDs dam_id = "ls-*"). Baseline catalog (USACE). High inclusivity; variable positional accuracy. Synthetic IDs provide stable tracking.
Lynker Spatial hydrofabric flowlines (ref_fab_fp) + waterbodies (ref_fab_wb). National hydrographic backbone (v2.3). Consistent topology for flowlines and waterbodies aligned to modeling needs.
OpenStreetMap (OSM): water polygons, water lines, dam lines. Volunteer geographic data adding local detail; quality and coverage vary regionally.
GNIS. USGS naming authority for natural/cultural features (dams, lakes, reservoirs), used for robust naming comparisons.
ResOpsUS. Reservoir operations and attributes useful for modeling and water management.
HILARRI. Curated links among NID (2024), GRanD (v1.3), and EHA (2024), connecting dams, reservoirs, and hydropower plants (ORNL/DOE).
GOODD. Global dam compilation (>38k) with attributes supporting large-scale analyses.
NWM (optionally re-linked to WB IDs). NOAA’s hydrologic modeling system. Reservoir POIs can be re-indexed to hydrofabric WBs to improve geometric alignment.
Bring Your Own.: The method is extensible so that anyone can add a dataset by specifying a unique ID, search radius, and rank weight; it will be harmonized with the principal data resources.

# stitched outputs (written by the runner)
res_rds  <- "output/reference-reservoirs.rds"

res <- readRDS(res_rds) |>
  dplyr::filter(!is.na(X)) |> 
  sf::st_as_sf(coords = c("X","Y"), crs = 5070, remove = FALSE)

Process Overview

Tiling

CONUS is divided into ~100 km cells. We process only tiles that intersect dams. Each tile runs independently (bounded memory; smaller candidate pools). Per-tile results are written to RDS; a final pass stitches tiles, resolving overlaps by preferring more supporting contexts (n) then closer snaps.

source("R/utils_fin.R")
#> Warning in fun(libname, pkgname): GEOS versions differ: lwgeom has 3.11.0 sf
#> has 3.14.0
#> Warning in fun(libname, pkgname): PROJ versions differ: lwgeom has 9.1.0 sf has
#> 9.6.2
#> Spherical geometry (s2) switched off
conus <- AOI::aoi_get(state = "conus") |> st_transform(5070)
tiles <- make_conus_grid(st_union(conus), cell_km = 100)  

if (!is.null(res)) {
  ggplot2::ggplot() +
    ggplot2::geom_sf(data = res, alpha = 0.15, size = 0.25) +
    ggplot2::geom_sf(data = tiles, fill = NA, color = "brown", size = 0.2) +
    ggplot2::labs(title = "Reservoirs", subtitle = "EPSG:5070",
         x = NULL, y = NULL) +
    ggplot2::theme_minimal()
} else {
  plot.new(); title("Dam points plot skipped (no X/Y)")
}

NID as Core Context

The NID defines the global set we validate, supplement, and standardize. Because NID IDs can be duplicated and locations imprecise, we assign stable synthetic IDs (dam_id = ls-*) and treat NID like any other context in scoring—but privileged as the anchor. Outputs retain NID identifiers while updated coordinates, names, and attributes can be adopted from the best realization across contexts. This preserves continuity with the most complete inventory while systematically improving accuracy via GNIS names, GOODD’s footprint, hydrofabric topology, and OSM detail—producing features that are geoconnex-ready and compatible with NWS POIs.

Context Definition

A context is an external dataset/layer (e.g., gnis, goodd, ref_fab_fp, osm_ww_poly) against which NID dams are compared. For each dam and context, we:

generate candidate pairs within a tuned search radius,
compute snap distance and name similarity (JW), and
filter/rank to a single best match per (dam, context).

Two derived contexts are also created by intersecting waterbodies and flowlines in each data family:

ref_int: intersections of ref_fab_wb × ref_fab_fp
osm_int: intersections of osm_ww_poly × osm_ww_lines

These provide strong geometry/topology anchors.

Ranking Policy

0 – Intersection evidence: ref_int, osm_int (geometry + topology; strongest).
1 – Curated/named: gnis, resops, goodd, osm_dam_lines, hillari.
2 – Direct/core geometries: osm_ww_poly, osm_ww_lines, ref_fab_fp, ref_fab_wb, nwm (re-linked), nid.
Tributary penalty: if river implies TR/OS/TRIB, add +5 to rank. Within any tier, smaller snap and smaller JW win.

Process

Per tile
1. Load dams (NID) and clip contexts.
2. Build representative points per context: points (identity), lines (midpoints/endpoints), polygons (point-on-surface).
3. Generate candidates via st_is_within_distance (per-context radius) with a KNN fallback gated by the same radius.
4. Score (snap distance, JW), apply tributary penalty; reduce to best per (dam, context).
5. Build a wide table of IDs (one column per context), select best realization per dam, compute QA (offset from NID), and distance to flowpath.
6. Write tile RDS and append a manifest row.

Contexts: Search Distance & Rank Priority

Context	Search Distance (m)	Rank	Group	Notes
ref_int	2000	0	Anchors / Derived	Intersections of ref_fab_wb × ref_fab_fp; highest-confidence geometry.
osm_int	2000	0	Anchors / Derived	Intersections of osm_ww_poly × osm_ww_lines; strong topology signal.
gnis	2000	1	Curated / Named	USGS names; authoritative nomenclature, variable location quality.
resops	2000	1	Curated / Named	Reservoir ops/attributes useful for modeling.
osm_dam_lines	1500	1	Curated / Named	OSM dam features; coverage varies.
hillari	2000	1	Curated / Named	Links dams–reservoirs–plants (ORNL/DOE).
goodd	2000	1	Curated / Named	Global dam footprint/attributes.
osm_ww_lines	1500	2	Direct / Network	Dense/noisy; short radius reduces false hits.
osm_ww_poly	1500	2	Direct / Network	Strong geometric anchors for reservoirs.
ref_fab_fp	1500	2	Direct / Fabric	Topologically consistent flowlines.
ref_fab_wb	2000	2	Direct / Fabric	Waterbodies as spatial anchors.
nwm	2000	2	Direct / POIs	Often mislocated; improved when re-indexed to WBs.
nid	2000	2	Core Dataset	Baseline set for validation & enrichment; stable synthetic IDs.

Risks & Mitigations

Risk / Complexity	Why it matters	Mitigation in this workflow
Mis-snap to wrong flowline/waterbody	Broken routing; bad inflow/outflow accounting	Per-context radii; intersections (`ref_int`/`osm_int`); rank 0
Duplicate/ambiguous IDs & names	Double-counting or missed joins	Synthetic `dam_id`, string prep + JW, cross-context tallies `n`
Noisy/shifted geometries (esp. NWM, NID)	High false positives; unstable matches	Rep points, short radii (750 m), KNN fallback within same gate
Seasonal shoreline changes	Point-on-surface drift vs. dam location	Prefer dam-aligned contexts; intersections; multi-context voting
Tile edge effects	Missed candidates near boundaries	Buffered tile search; global stitch preferring `n` then distance
Nonstationarity / updates over time	Drift between versions; reproducibility	Tile manifests, context IDs, rank map documented
Licensing & attribution (OSM)	Compliance and redistribution	Keep source IDs/contexts; document license provenance

RFC-DA

A separate but related task in developing the reference reservoir set is to (1) identify reservoirs suitable for the RFC-DA system and (2) provide a consistent set of reservoir parameters for use in the National Water Model (NWM) under this scheme. In the current NWM, all reservoir parameters excluding WeirE and LkArea and LkMxE are populated with a single default value. Our goal is to provide a more defensible and consistent set of parameters using a combination of NID attributes and DEM-based surrogates, anchored by the reservoir surface and dam toe elevations. When this proves impossible, we default to the primary NWM values (WeirC = 0.4, WeirL = 10 m, OrficeC = 0.1, OrficeA = 1 m², ifd = 0.899).

For this first version (v1), we included only reservoirs within 1 km of a reference flowpath that had an associated OSM or reference waterbody with an area greater than 0.2 km². Once identified, two key DEM-based measures were derived: (1) the mean elevation of the OSM and/or reference waterbody and (2) the elevation of the dam toe (dam_elev). These were extracted from the 1/3 arc-second (10 m) 3DEP DEM. From these anchors, we computed a suite of reservoir parameters required for RFC-DA in the NWM, using NID attributes wherever possible. When NID values were missing or inconsistent, we applied a transparent set of heuristics and, as a last resort, the fixed defaults currently used in the NWM. An optional flag (use_hazard = TRUE) was enabled in this v1 release to modestly increase weir length (WeirL) and orifice area (OrficeA) or bias orifice coefficients upward for significant and high-hazard dams, ensuring more conservative estimates. By default this flag is off (FALSE) to avoid introducing policy-driven noise when using the function elsewhere.

The derived variables include: H_m (hydraulic height), LkArea (reservoir area, m²), WeirE (crest elevation), LkMxE (maximum pool elevation), OrficeE (invert elevation), WeirC (weir coefficient), WeirL/Dam_Length (weir length), OrficeC (orifice coefficient), OrficeA (orifice area), and ifd (fixed constant).

Hydraulic height (H_m) is selected in priority order from structural_height, dam_height, hydraulic_height, and finally nid_height. When direct measurements were absent, elevation fractions were applied: crest ≈ dam_elev + 0.90 * H_m, invert ≈ dam_elev + 0.15 * H_m max pool ≈ wb_elev + 0.10 * H_m or dam_elev + 1.00 * H_m.

Storage attributes (nid_storage, normal_storage, max_storage) were converted from acre-feet to cubic meters (1 ac-ft = 1233.48 m³) and, when paired with LkArea, used to approximate mean depth.

Coefficients and areas were inferred from categorical descriptors and dam height:

WeirC was set to 1.6 for broad-crested, 1.7 for ogee, 1.84 for sharp-crested, and 1.6 for earthen dams when unspecified (Chow, 1959).

OrficeC was set to 0.62 for sharp-edged/sluice/pipe outlets, 0.80 for gated or rounded entries, and defaulted to 0.1 otherwise (Chow, 1959).

Orifice areas (OrficeA) were assigned by dam height (<10 m → 0.5 m²; 10–30 m → 0.9 m²; ≥30 m → 1.5 m²), with a 1.2 m² override for concrete or ogee dams. The coefficient ranges align with established values in Open-Channel Hydraulics (Chow, 1959). The use of fractional height surrogates for crest and invert levels is consistent with screening-level approaches employed by FEMA and USACE when design drawings are unavailable (FEMA, 2004). Storage-to-area ratios are a standard method for approximating mean depth in reservoir studies (USACE, 1995). Importantly, all surrogates are intended for national-scale screening and modeling, not for site-specific engineering or safety determinations. DEM-based anchors (dam_elev, wb_elev) may vary with DEM quality, so regional refinements are encouraged where higher-resolution data are available.

The end result is a traceable, reproducible, and tunable framework where each dam–waterbody record is enriched with consistent hydraulic variables needed for RFC-DA in the NWM. The process was able to extend the scope of candidate reservoirs ~7x and offers a more refined set of attributes beyond global defaults.

Future Work

When it comes to the reference-reservoirs, a significant part of the workflow is heuristic based (rank order, search radius, tributary penalty, etc). These were developed through trial and error and expert judgement. There is significant opportunity to refine these heuristics with regional calibrations or more manual investigation. In this first pass, as with any reference system, a source of truth was needed. In this first pass, NID was considered the truth, and external entities were used to refine the location, and populate more attributes and outlinks. In the future, creating a ore complete “truth” dataset from the multiple sources could be considered - in particular the OSM dam lines.

With respect to the hydraulic estimation, there is significant areas for enhancement now that this version 1 dataset is defined. Future work could include: (1) expanding the reservoir set by relaxing proximity and size thresholds, (2) incorporating additional data sources for reservoir surface and dam toe elevations (from all linked resources), (3) refining heuristics with regional calibrations or machine learning, and (4) integrating dynamic reservoir operation rules where available.

Use:

To use this repo, all data is stored wit the exception of OSM. All data - including OSM - can be downloaded with the direction in the data/data_prep.R.
Run workflow/01_process_tiles_nid.R If new resources are added, be sure to include them in the ingest as well as provide a rank and radius
workflow/02_stich.R stitches the tiles together and adds preliminary info to define the reference-reservoir set (data/reference-reservoirs-v1.gpkg). This includes distance to flowpath and waterbody area.
workflow/03_hydraulics.R selects the candidate reservoirs and adds parameters the parameters needed for RFC-DA in the NWM.
If you want to recreate the webmap, run the make file in scripts/tiles using the latest gpkg. Output can be viewed with pnpm dev --strictPort --port 8000

⸻

References

•   Chow, V. T. (1959). Open-Channel Hydraulics. McGraw-Hill, New York.
•   FEMA (2004). Federal Guidelines for Dam Safety: Selecting and Accommodating Inflow Design Floods for Dams. FEMA 94. Federal Emergency Management Agency, Washington, D.C.
•   USACE (1995). Hydrologic Engineering Requirements for Reservoirs. Engineer Regulation ER 1110-2-240. U.S. Army Corps of Engineers, Washington, D.C.

⸻

Appendix: Version 1.0 plots

if (exists("res") && nrow(res)) {
  p1 <- ggplot2::ggplot(res, ggplot2::aes(x = realization_snap_m)) +
    ggplot2::geom_histogram(bins = 50) +
    ggplot2::labs(title = "Snap distance (m)") + ggplot2::theme_minimal()

  p2 <- ggplot2::ggplot(res, ggplot2::aes(x = realization_jw)) +
    ggplot2::geom_histogram(bins = 50) +
    ggplot2::labs(title = "Name similarity (JW)") + ggplot2::theme_minimal()

  p3 <- ggplot2::ggplot(res, ggplot2::aes(x = n)) +
    ggplot2::geom_histogram(binwidth = 1) +
    ggplot2::scale_x_continuous(breaks = 0:10) +
    ggplot2::labs(title = "Supporting contexts per dam (n)") + ggplot2::theme_minimal()

  print(p1); print(p2); print(p3)
}

#> Warning: Removed 54654 rows containing non-finite outside the scale range
#> (`stat_bin()`).

if (exists("res") && nrow(res)) {
  ctx_cols <- c("gnis","resops","goodd","nwm","osm_ww_poly","osm_ww_lines",
                "osm_dam_lines","ref_fab_fp","ref_fab_wb","ref_int","osm_int","nid")
  have <- intersect(ctx_cols, names(res))
  if (length(have)) {
    long <- tidyr::pivot_longer(as.data.frame(res), dplyr::all_of(have), names_to = "context", values_to = "id")
    long$has <- !is.na(long$id)
    ggplot2::ggplot(long, ggplot2::aes(x = context, fill = has)) +
      ggplot2::geom_bar() +
      ggplot2::coord_flip() +
      ggplot2::labs(title = "Context coverage (count of dams with a match)", y = "count", x = NULL) +
      ggplot2::theme_minimal()
  }
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
R		R
data		data
man/figures		man/figures
output		output
public		public
scripts/tiles		scripts/tiles
src		src
workflow		workflow
.gitignore		.gitignore
03_hydraulics.R		03_hydraulics.R
LICENSE		LICENSE
README.Rmd		README.Rmd
README.md		README.md
diagram.svg		diagram.svg
index.html		index.html
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
reference.reservoirs.Rproj		reference.reservoirs.Rproj
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reservoir Alignment: Multi-Context Reconciliation

View the dataset here

Narrative

Inputs

Process Overview

Tiling

NID as Core Context

Context Definition

Ranking Policy

Process

Contexts: Search Distance & Rank Priority

Risks & Mitigations

RFC-DA

Future Work

Use:

References

Appendix: Version 1.0 plots

About

Uh oh!

Releases

Packages

Languages

License

lynker-spatial/reference.reservoirs

Folders and files

Latest commit

History

Repository files navigation

Reservoir Alignment: Multi-Context Reconciliation

View the dataset here

Narrative

Inputs

Process Overview

Tiling

NID as Core Context

Context Definition

Ranking Policy

Process

Contexts: Search Distance & Rank Priority

Risks & Mitigations

RFC-DA

Future Work

Use:

References

Appendix: Version 1.0 plots

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages