Skip to content

🍳 [CVPR'25] PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting

License

Notifications You must be signed in to change notification settings

chengzhag/PanSplat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🍳 PanSplat

PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting

Cheng Zhang, Haofei Xu, Qianyi Wu, Camilo Cruz Gambardella, Dinh Phung, Jianfei Cai

teaser

πŸš€ Introduction

🌍 Our method processes two wide-baseline 360° panoramas and generates a Spherical 3D Gaussian Pyramid that can be rendered into novel views. This repo contains training, testing, evaluation code of our CVPR 2025 paper.

πŸ› οΈ Installation

We use Anaconda to manage the environment. You can create the environment by running the following command:

conda create -n pansplat python=3.10
conda activate pansplat
pip install torch==2.4.0+cu118 torchvision==0.19.0+cu118 torchaudio==2.4.0+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
pip install git+https://github.com/dcharatan/diff-gaussian-rasterization-modified
pip3 install -U xformers==0.0.27.post2+cu118 --index-url https://download.pytorch.org/whl/cu118

We use wandb to log and visualize the training process. You can create an account then login to wandb by running the following command:

wandb login

⚑ Quick Demo on Synthetic Data

You can download the pretrained checkpoints last.ckpt (trained on the Matterport3D dataset at 512 Γ— 1024 resolution) and put it in the logs/nvpl49ge/checkpoints folder. Then run the following command to test the model:

python -m src.paper.demo +experiment=pansplat-512 ++model.weights_path=logs/nvpl49ge/checkpoints/last.ckpt mode=predict

The code will use the sample images in the datasets/pano_grf folder:

demo_input_image1 demo_input_image2

The output will be saved in the folder with the format outputs/2025-01-13/16-56-04: demo_output_video (Video file here)

Additionally, we provide a fine-tuned checkpoint last.ckpt (fine-tuned on the Matterport3D dataset at 2048 Γ— 4096 resolution) for 4K panorama synthesis. You can put it in the logs/hxlad5nq/checkpoints folder and run the following command to test the model:

python -m src.paper.demo +experiment=pansplat-2048 ++model.weights_path=logs/hxlad5nq/checkpoints/last.ckpt mode=predict

This requires a GPU with at least 24GB of memory, e.g., NVIDIA RTX 3090.

πŸ“‚ Data Preparation

PanoGRF Data

We use the data preparation code from the PanoGRF repo to render the Matterport3D dataset and generate the Replica and Residential datasets. Please download pano_grf_lr.tar from link and unzip it to the datasets folder. We also rendered a smaller Matterport3D dataset with higher resolution for fine-tuning. If you plan to fine-tune the model at higher resolution, please download pano_grf_hr.tar and unzip it to the datasets folder.

360Loc Data

We use the 360Loc dataset for fine-tuning to real-world data. Please download the data from the official link and unzip the separate parts to the datasets/360Loc folder.

Our Video Data

We provide two sample videos for testing cross-dataset generalization. Please download insta360.tar from link and unzip it to the datasets folder.

πŸŽ₯ Use your own video...

We use stella_vslam, a community fork of xdspacelab/openvslam, to extract the camera poses from self-captured videos. You can follow the official guide to install the stella_vslam. Before building stella_vslam_examples, please replace the below code in the src/run_video_slam.cc file:

slam->save_frame_trajectory(eval_log_dir + "/frame_trajectory.txt", "TUM");
slam->save_keyframe_trajectory(eval_log_dir + "/keyframe_trajectory.txt", "TUM");

with:

slam->save_frame_trajectory(eval_log_dir + "/frame_trajectory.txt", "KITTI");
slam->save_keyframe_trajectory(eval_log_dir + "/keyframe_trajectory.txt", "KITTI");

so that the camera poses are saved in the KITTI format. We recommend installing with SocketViewer and set up the SocketViewer for visualizing the SLAM process on a remote server. After building the stella_vslam, please change to the build directory following this link and download the ORB vocabulary:

curl -sL "https://github.com/stella-cv/FBoW_orb_vocab/raw/main/orb_vocab.fbow" -o orb_vocab.fbow

After that, please put your video in a separate folder under the datasets/insta360 folder and rename it to video.mp4. You can run the following command under the directory of video folder to run SLAM mapping:

~/lib/stella_vslam_examples/build/run_video_slam -v ~/lib/stella_vslam_examples/build/orb_vocab.fbow -m video.mp4 -c ../equirectangular.yaml --frame-skip 1 --no-sleep --map-db-out map.msg --viewer socket_publisher --eval-log-dir ./ --auto-term

Finally, you can run the following command to extract the camera poses by running localization only:

~/lib/stella_vslam_examples/build/run_video_slam --disable-mapping -v ~/lib/stella_vslam_examples/build/orb_vocab.fbow -m video.mp4 -c ../equirectangular.yaml --frame-skip 1 --no-sleep --map-db-in map.msg --viewer socket_publisher --eval-log-dir ./ --auto-term

The camera poses will be saved in the frame_trajectory.txt file. You can then follow the Demo on Real-World Data section using the insta360 dataset command to test the model on your own video.


🎯 Training and Testing

Pretrained Models

We use part of the pretrained UniMatch weights from MVSplat and the pretrained panoramic monocular depth estimation model from PanoGRF. Please download the weights and put them in the checkpoints folder.

Train on Matterport3D

We train the model on the Matterport3D dataset starting from a low resolution and fine-tune it at higher resolutions. If you are looking to fine-tune the model on 360Loc dataset, you can stop at the 512 Γ— 1024 resolution. Or instead, you can skip this part by downloading the pretrained checkpoints last.ckpt and put it in the logs/nvpl49ge/checkpoints folder.

Please first run the following command to train the model at 256 Γ— 512 resolution:

python -m src.main +experiment=pansplat-256 mode=train

ℹ️ Hint: The training takes about 1 day on a single NVIDIA A100 GPU. Experiments are logged and visualized to wandb under the pansplat project. You'll get a WANDB_RUN_ID (e.g., ek6ab466) after running the command. Or you can find it in the wandb dashboard. At the end of the training, the model will be tested and the evaluation results will be logged to wandb as table. The checkpoints are saved in the logs/<WANDB_RUN_ID>/checkpoints folder. Same for the following experiments.

Please then replace the model.weights_path parameter of config/pansplat-512.yaml with the path to the last checkpoint of the 256 Γ— 512 resolution training and run the following command to fine-tune the model at 512 Γ— 1024 resolution:

python -m src.main +experiment=pansplat-512 mode=train
🏞 If you want to fine-tune on high resolution Matterport3D data...

Similarly, update the model.weights_path settings in config/pansplat-1024.yaml and fine-tune the model at 1024 Γ— 2048 resolution:

python -m src.main +experiment=pansplat-1024 mode=train

Finally, update the model.weights_path settings in config/pansplat-2048.yaml and fine-tune the model at 2048 Γ— 4096 resolution:

python -m src.main +experiment=pansplat-2048 mode=train

Fine-tune on 360Loc

We fine-tune the model on the 360Loc dataset from the weights trained on the Matterport3D dataset at 512 Γ— 1024 resolution. If you want to skip this part, you can find the checkpoints here. We provide checkpoints for 512 Γ— 1024 (ls933m5x) and 2048 Γ— 4096 (115k3hnu) resolutions.

Please update the model.weights_path parameter of config/pansplat-512-360loc.yaml to the path of the last checkpoint of the Matterport3D training at 512 Γ— 1024 resolution, then run the following command:

python -m src.main +experiment=pansplat-512-360loc mode=train

We then gradually increase the resolution to 1024 Γ— 2048 and 2048 Γ— 4096 and fine-tune from the lower resolution weights:

python -m src.main +experiment=pansplat-1024-360loc mode=train
python -m src.main +experiment=pansplat-2048-360loc mode=train

Remember to update the model.weights_path parameter in the corresponding config files before running the commands.

🌍 Demo on Real-World Data

First please make sure you have followed the steps in the Fine-tune on 360Loc section to have the checkpoints ready. You can then test the model on the 360Loc or Insta360 dataset by running the following command:

python -m src.paper.demo +experiment=pansplat-512-360loc ++model.weights_path=logs/ls933m5x/checkpoints/last.ckpt mode=predict
python -m src.paper.demo +experiment=pansplat-512-360loc ++model.weights_path=logs/ls933m5x/checkpoints/last.ckpt mode=predict dataset=insta360

ℹ️ Hint: You can replace the model.weights_path parameter with what you have fine-tuned.

The output will be saved in the folder with the format outputs/2025-01-13/16-56-04: atrium-daytime_360_1-50_53 (Video file here) VID_20240914_103257_00_005-9930_9946 (Video file here)

For the 2048 Γ— 4096 resolution model, you can run the following command:

python -m src.paper.demo +experiment=pansplat-2048-360loc ++model.weights_path=logs/115k3hnu/checkpoints/last.ckpt mode=predict
python -m src.paper.demo +experiment=pansplat-2048-360loc ++model.weights_path=logs/115k3hnu/checkpoints/last.ckpt mode=predict dataset=insta360

Additionally, we provide commands for longer image sequences inputs:

python -m src.paper.demo +experiment=pansplat-512-360loc ++model.weights_path=logs/ls933m5x/checkpoints/last.ckpt
python -m src.paper.demo +experiment=pansplat-512-360loc ++model.weights_path=logs/ls933m5x/checkpoints/last.ckpt dataset=insta360
python -m src.paper.demo +experiment=pansplat-2048-360loc ++model.weights_path=logs/115k3hnu/checkpoints/last.ckpt
python -m src.paper.demo +experiment=pansplat-2048-360loc ++model.weights_path=logs/115k3hnu/checkpoints/last.ckpt dataset=insta360

Example output: VID_20240922_102141_00_006-21456-21616 (Video file here)

For more example outputs, please view the Short Video | Full Video | Interactive Demo.

πŸ“ˆ Reproducing the Results

MVSplat Baseline

We also provide the MVSplat baseline for comparison. You can put the weights trained on Matterport3D dataset in logs/qbn2ltku/checkpoints and run the following command to test the model:

python -m src.main +experiment=mvsplat-512 ++model.weights_path=logs/qbn2ltku/checkpoints/last.ckpt mode=test test.compute_scores=true wandb.name=test_mvsplat

Or you can train the model yourself by running the following command:

python -m src.main +experiment=mvsplat-256 mode=train
python -m src.main +experiment=mvsplat-512 mode=train
python -m src.main +experiment=mvsplat-512-360loc mode=train

Similarly, don't forget to update the model.weights_path parameter in the corresponding config files before fine-tuning. The last command will fine-tune the model on the 360Loc dataset for comparison on real-world data in the Images Quality vs. Frame Distance section.

GPU Consumption

We provide a script to reproduce Figure G.1 for GPU consumption comparison. An NVIDIA A100 GPU is required to run the script:

python -m src.paper.benchmark_exp
python -m src.paper.benchmark_fig

The experiment is configured in the config/benchmark.yaml file, with arguments for each ablation study experiment. You can use the training commands in Train on Matterport3D with these augments to reproduce the ablation study results.

After automated benchmarking, the intermediate results and figures will be saved in the outputs/benchmark folder.

An example is shown below:

benchmark_ab

Images Quality vs. Frame Distance

We provide a script to reproduce Figure F.1 for images quality vs. frame distance comparison. Before drawing the figure, please first prepare evaluation results on the Insta360 dataset:

python -m src.main +experiment=casuni-512-360loc ++model.weights_path=logs/ls933m5x/checkpoints/last.ckpt mode=test test.compute_scores=true wandb.name=test_pansplat dataset=insta360
python -m src.main +experiment=pansplat-512-360loc ++model.weights_path=logs/l8l2j6pb/checkpoints/last.ckpt mode=test test.compute_scores=true wandb.name=test_pansplat-wo_defbl model.encoder.gaussian_head.deferred_blend=false dataset=insta360
python -m src.main +experiment=mvsplat-512-360loc ++model.weights_path=logs/3q5jp96j/checkpoints/last.ckpt mode=test test.compute_scores=true wandb.name=test_mvsplat dataset=insta360

You can download the checkpoints or use your own checkpoints from Fine-tune on 360Loc and MVSplat Baseline sections. For PanSplat without deferred blending, please follow Fine-tune on 360Loc with the model.encoder.gaussian_head.deferred_blend=false parameter to train the model. Don't forget to update the experiment id in src/paper/frame_vs_metric.py if you are using your own checkpoints.

Finally, you can run the following command to draw the figure:

python -m src.paper.frame_vs_metric

The figure will be saved in the outputs/frame_vs_metric folder.

An example is shown below:

frame_vs_metric

πŸ“– Citation

If you find our work helpful, please consider citing:

@inproceedings{zhang2024pansplat,
      title={PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting}, 
      author={Cheng Zhang and Haofei Xu and Qianyi Wu and Camilo Cruz Gambardella and Dinh Phung and Jianfei Cai},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      year={2025}
}

πŸ’‘ Acknowledgements

Our paper cannot be completed without the amazing open-source projects PanoGRF, MVSplat, stella_vslam, pixelSplat, UniMatch...

Also check out Zheng's concurrent work Splatter360 for more panorama synthesis methods.

About

🍳 [CVPR'25] PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages