Performance and Generalizability Impacts of Incorporating Location Encoders into Deep Learning for Dynamic PM2.5 Estimation
Paper (GIScience & Remote Sensing): Performance and generalizability impacts of incorporating location encoders into deep learning for dynamic PM2.5 estimation DOI: https://doi.org/10.1080/15481603.2025.2594797 Preprint: https://arxiv.org/abs/2505.18461 Code base: Fork/extension of
geohai/PM2.5_CONUS_LSTMAuthors: Morteza Karimzadeh · Zhongying Wang · James L. Crooks Status: Published (2025)
This repository contains code to reproduce experiments from our paper on how earth embeddings impact deep learning-based estimations for a temporally dynamic, spatially heterogeneous geospatial task: daily surface PM2.5 estimation over CONUS using a strong Bi-LSTM + Attention baseline (remote sensing + meteorology + ancillary variables). Unlike many prior evaluations of Earth embeddings on relatively static or long-term averaged targets, this work focuses on a real-world estimation problem where the signal is highly dynamic and extreme values are important to be estimated for downstream tasks.
A central contribution is a systematic evaluation of three ways to incorporate location, showing that:
- Naive raw coordinates can help within-region interpolation, but often hurt out-of-region generalization.
- Pretrained location encoders (e.g., GeoCLIP) can improve both accuracy and geographic generalizability, especially under rigorous spatial disjoint evaluation.
- Embedding fusion effectively acts like conditioning inference on geographic priors (“what is typical about this place?”) while the dynamic inputs (AOD, met, smoke, etc.) capture day-to-day variation.
Recent “geospatial foundation models” increasingly aim to learn reusable representations of the Earth. Location encoders like GeoCLIP and SatCLIP can be seen as producing Earth embeddings: a compact vector representation of a place derived from large-scale pretraining.
Most existing evaluations of such Earth embeddings emphasize comparatively static or temporally smoothed prediction tasks (e.g., air temperature, biomass, land cover). It remains less clear how these static representations behave when embedded in dynamic estimation problems, where targets vary rapidly over time and rare extremes create the bulk of error, as important as they are.
In our setting, those embeddings behave like a static geographic prior:
- The time-series branch learns relationships from dynamic observations (aerosols, meteorology, smoke).
- The Earth embedding injects stable contextual information (built environment / land use / infrastructure signals, etc., depending on the pretraining data).
- Fusion (especially Hadamard product) allows the model to gate/modulate temporal representations by place-specific context—i.e., conditioning inference on priors rather than memorizing raw lat/lon → PM2.5 mappings.
We compare four geolocation integration strategies in a Bi-LSTM + Attention PM2.5 estimation pipeline:
-
No geolocation (baseline): model must learn a global mapping from dynamic predictors → PM2.5
-
Raw lat/lon appended as features
-
Sinusoidal lat/lon (
sin/cos) appended as features -
Pretrained location encoder embeddings (GeoCLIP; plus ablations with SatCLIP), fused with the temporal representation
- Fusion methods: Hadamard product vs concatenation (ablation)
When test data are spatially in-distribution (random split; spatial holdout split), adding location generally helps:
- Raw lat/lon and sin/cos improve performance vs. no-location in WR settings.
- GeoCLIP embeddings + Hadamard fusion yield the strongest WR improvements, including improved stability across splits.
Under checkerboard spatial partitions (disjoint train/test regions), we find a clear pattern:
- Raw lat/lon (and sin/cos) often degrade OoR generalization, consistent with models overfitting to region-specific location–target associations.
- GeoCLIP embeddings (Earth embeddings) are consistently competitive and often best, because they provide transferable geographic context without letting the downstream model “cheat” via direct coordinates, even in highly dynamic settings.
Ablations show:
- Hadamard fusion outperforms concatenation for location encoders in this task.
- This supports interpreting Hadamard fusion as a mechanism for conditioning dynamic Earth-observation models on geographic priors, rather than treating embeddings as independent predictors.
- GeoCLIP > SatCLIP in our experiments, plausibly reflecting differences in what each encoder’s pretraining data captures (human-centric Flickr imagery vs. Sentinel-2 imagery).
GeoCLIP-enhanced maps show:
- Better spatial coherence in some sparsely monitored regions and better estimation of known urban hotspots and wildfire-related extremes, particularly at the high-concentration tail where deep learning models typically underperform,
- But also potential artifact/noise patterns in some under-sampled regions—consistent with uneven upstream pretraining coverage + high-frequency basis functions in positional encoders.
| Encoder | What it is | Code |
|---|---|---|
| GeoCLIP | CLIP-style location encoder aligned with geotagged Flickr imagery; outputs 512-D Earth embeddings | https://github.com/VicenteVivan/geo-clip |
| SatCLIP | CLIP-style location encoder aligned with Sentinel-2 imagery; outputs 256-D Earth embeddings | https://github.com/microsoft/satclip |
- Code is organized to support training/evaluation across multiple spatial partitioning schemes (WR and OoR).
- We keep location encoders frozen during downstream training to evaluate them as general-purpose priors rather than task-adapted encoders.
- Deterministic seeds and environment management are included for repeatability.
Karimzadeh, M., Wang, Z., & Crooks, J. L. (2025). Performance and generalizability impacts of incorporating location encoders into deep learning for dynamic PM2.5 estimation. GIScience & Remote Sensing, 62(1). https://doi.org/10.1080/15481603.2025.2594797
@article{karimzadeh2025locationencoderspm25,
title = {Performance and generalizability impacts of incorporating location encoders into deep learning for dynamic PM2.5 estimation},
author = {Karimzadeh, Morteza and Wang, Zhongying and Crooks, James L.},
journal = {GIScience \& Remote Sensing},
volume = {62},
number = {1},
year = {2025},
doi = {10.1080/15481603.2025.2594797}
}