TL;DR: Introducing Loci-Segmented, an extension to Loci, with a dynamic background module. Demonstrates over 32% relative IoU improvement to SOTA on the MOVi dataset.
loci-seg-03.mp4
A suitable conda environment named loci-s
can be created
and activated with:
conda env create -f environment.yml
conda activate loci-s
Preprocessed datasets together with model checkpoints can be found here
Make sure you download all necessary datasets and model checkpoints. To reproduce the MOVi results run:
run-movi-evalulation.sh
python eval-movi.py
To reproduce the evaluation on the datasets presented in the review paper on "Compositional scene representation learning via reconstruction: A survey" run:
run-review.sh
process-review.sh
python eval-review.py
We provide a example dataset creating script that you can adjust to your needs.
You can also inspect any compatible dataset using our Dataset Viewer
data/plot_hdf5.py <dataset>.hdf5
Our training pipeline employs multi-GPU configurations and extensive pretraining to accelerate model convergence. Specifically, we use a single node with 8 x GTX1080 GPUs for the pretraining phase, and a single node with 8 x A100 GPUs for the final Loci-s training. Below are the details for each stage of the training pipeline.
Note: The following examples use a single GPU setup, which is suboptimal for performance. Multi-GPU configurations are highly recommended.
-
Decoder Pretraining
Pretrain individual decoders for mask, depth, and RGB using the following commands:
python -m model.main -cfg configs/pretrain-mask-decoder.json --pretrain-objects --single-gpu python -m model.main -cfg configs/pretrain-depth-decoder.json --pretrain-objects --single-gpu python -m model.main -cfg configs/pretrain-rgb-decoder.json --pretrain-objects --single-gpu
-
Encoder-Decoder Pretraining
Pretrain the Loci encoder with already pretrained mask, depth, and RGB decoders:
python -m model.main -cfg configs/pretrain-encoder-decoder-stage1.json --pretrain-objects --single-gpu --load-mask <mask-decoder>.ckpt --load-depth <depth-decoder>.ckpt --load-rgb <rgb-decoder>.ckpt
For a version that utilizes depth as an input feature, append
-depth
to the config name. -
Hyper-Network Pretraining
Execute three passes through the encoder-decoder architecture to train the internal hyper-networks:
python -m model.main -cfg configs/pretrain-encoder-decoder-stage2.json --pretrain-objects --single-gpu --load-stage1 <encoder-decoder>.ckpt
-
Background Module Pretraining
Train the background module:
python -m model.main -cfg configs/pretrain-background.json --pretrain-bg --single-gpu
Execute full-scale training for Loci-s:
python -m model.main -cfg configs/loci-s.json --train --single-gpu --load-objects <encoder-decoder>.ckpt --load-bg <background>.ckpt
Generate visualizations to inspect the model at various stages of pretraining and during the final phase.
To visualize individual components like mask, depth, RGB, objects, or background during pretraining:
python -m model.main -cfg <config> --save-<mask|depth|rgb|objects|bg> --single-gpu --add-text --load <checkpoint>.ckpt
For visualizing the fully trained Loci-s model:
python -m model.main -cfg <config> --save --single-gpu --add-text --load <checkpoint>.ckpt
Note: To visualize using the segmentation pretraining network, append the
--load-proposal
flag followed by the corresponding checkpoint:
--load-proposal <proposal>.ckpt