Deep Learning for Clouds and Cloud Shadow Segmentation in Methane Satellite and Airborne Imaging Spectroscopy π°οΈ
This repository contains the main code of our paper: Deep Learning for Clouds and Cloud Shadow Segmentation in Methane Satellite and Airborne Imaging Spectroscopy
The cloud and shadow detection system processes MethaneAIR and MethaneSAT L1B hyperspectral data to generate accurate per-pixel masks for:
- Clouds βοΈ
- Cloud shadows π₯οΈ
- Dark surfaces π
- Background/Clear areas
βββ README.md
βββ requirements.txt # Python dependencies
βββ Dockerfile # Container setup
βββ build_container.sh # Docker build script
βββ run_container.sh # Docker run script
βββ run_experiment.py # Batch experiment orchestrator
βββ config/ # Experiment configurations
β βββ mair_cs_*.yaml # MethaneAIR configs
β βββ msat_cs_*.yaml # MethaneSAT configs
βββ cloud_shadows_detection/ # Main package
β βββ train.py # Training script
β βββ utils.py # Training utilities
β βββ models/ # Model implementations
β β βββ build_model.py # Model factory
β β βββ hyperspectral_logreg.py # Logistic regression
β β βββ mlp_utils.py # MLP utilities
β β βββ unet.py # U-Net architecture
β β βββ scan.py # SCAN attention network
β β βββ combined_cnn.py # Combined CNN
β β βββ combined_mlp.py # Combined MLP
β β βββ ViT_Segformer.py # Vision Transformer
β βββ datasets/ # Data handling
β βββ dataset.py # Dataset classes
β βββ dataset_utils.py # Data utilities
βββ checkpoints/ # Saved model results
βββ mair_cs/ # MethaneAIR results
βββ msat_cs/ # MethaneSAT results
βββ data/ # L1B data
βββ mair_cs/ # MethaneAIR data
βββ msat_cs/ # MethaneSAT data
All datasets (MethaneAIR and MethaneSAT hyperspectral imagery with ground truth labels) are available through Harvard Dataverse:
- Dataset size: ~508 MethaneAIR hyperspectral cubes, ~262 MethaneSAT samples
- Format: L1B calibrated hyperspectral data with corresponding cloud/shadow masks
Our comprehensive evaluation demonstrates state-of-the-art performance across multiple model architectures:
| Dataset | Best Model | Accuracy | F1-Score | Precision | Recall |
|---|---|---|---|---|---|
| MethaneAIR | Combined CNN | 89.42Β±1.20% | 78.50Β±3.08% | 74.44Β±1.89% | 88.97Β±2.77% |
| MethaneSAT | Combined CNN | 81.96Β±1.45% | 78.80Β±1.28% | 78.85Β±0.86% | 81.09Β±1.23% |
| Model | Accuracy | F1 | Precision | Recall |
|---|---|---|---|---|
| ILR | 73.81Β±4.05 | 62.07Β±0.86 | 61.33Β±0.67 | 72.59Β±1.46 |
| MLP | 82.49Β±2.24 | 71.29Β±1.02 | 68.24Β±1.04 | 81.42Β±0.85 |
| U-Net | 88.26Β±0.45 | 76.24Β±1.90 | 72.59Β±2.13 | 83.65Β±1.03 |
| SCAN | 86.51Β±2.90 | 74.96Β±0.96 | 72.17Β±1.60 | 83.46Β±3.13 |
| Combined CNN | 89.42Β±1.20 | 78.50Β±3.08 | 74.44Β±1.89 | 88.97Β±2.77 |
Option 1: Local Installation.
We strongly recommend using a virtual environment. Set up a venv environment with:
python3 -m venv hsr
source hsr/bin/activate
pip install -r requirements.txt
Option 2: Docker container.
Alternatively, a docker image is contained in Dockerfile. For a containerized setup, use the provided Docker scripts:
bash build_container.sh
bash run_container.sh
The results from our published paper can be fully reproduced using the provided configuration files. Each config file specifies the exact hyperparameters, model architectures, and experimental settings used.
ilr: Iterative Logistic Regressionmlp: Multi-Layer Perceptronunet/unetv1: U-Net convolutional architecturescan: Spectral Channel Attention Networkcombined_cnn: Combined CNN (best performing)combined_mlp: Combined MLP ensemble
Single model training:
python cloud_shadows_detection/train.py \
--data_dir data/mair_cs \
--model_name combined_cnn \
--run_dir experiments \
--lr 5e-4 \
--norm_type std_full \
--weightedReproduce paper results:
# MethaneAIR experiments
python run_experiment.py --config config/mair_cs_scan.yaml
python run_experiment.py --config config/mair_cs_unet.yaml
python run_experiment.py --config config/mair_cs_mlp.yaml
# MethaneSAT experiments
python run_experiment.py --config config/msat_cs_scan.yaml
python run_experiment.py --config config/msat_cs_unet.yaml
python run_experiment.py --config config/msat_cs_mlp.yamlThe run_experiment.py script orchestrates batch experiments with parallel execution, automatically handling:
- 3-fold cross-validation
- Multiple learning rates and hyperparameter grids
- Model checkpointing and resumption
--model_name: Model architecture to use--data_dir: Path to dataset (mair_cs or msat_cs)--norm_type: Normalization strategy (std_fullornone)--weighted: Use class-weighted loss for imbalanced data--lr: Learning rate (optimized per model in configs)
@article{PrezCarrasco2025DeepLF,
title={Deep Learning for Clouds and Cloud Shadow Segmentation in Methane Satellite and Airborne Imaging Spectroscopy},
author={Manuel P{\'e}rez-Carrasco and Maya Nasr and Sebastien Roche and Christopher Chan Miller and Zhan Zhang and Core Francisco Park and Eleanor Walker and Cecilia Garraffo and Douglas Finkbeiner and Ritesh Gautam and Steve Wofsy},
journal={ArXiv},
year={2025},
volume={abs/2509.19665},
url={https://api.semanticscholar.org/CorpusID:281505215},
doi={doi:10.7910/DVN/IKLZOJ}
}
For questions or feedback, please open an issue on this repository or contact maperezc@udec.cl.