We present SAP3D, which reconstructs the 3D shape and texture of an object with a variable number of real input images. The quality of 3D shape and texture improve with more views
The More You See in 2D, the More You Perceive in 3D
Xinyang Han*1, Zelin Gao*2, Angjoo Kanazawa1, Shubham Goel†3, Yossi Gandelsman†1
1 UC Berkeley, 2 Zhejiang University, 3 Avataar
CVPR 2024 (Highlight)
project page | arxiv | bibtex
See installation instructions.
See Preparing Datasets for SAP3D.
Overview of SAP3D. We first compute coarse relative camera poses using an off-the-shelf model. We fine-tune a view-conditioned 2D diffusion model on the input images and simultaneously refine the camera poses via optimization. The resulting instance-specific diffusion model and camera poses enable 3D reconstruction and novel view synthesis from an arbitrary number of input images.
This pipeline encompasses 3 stages for pose estimation and reconstruction:
- Pose Estimation Initialization: We use scaled-up RelposePP to initialize the poses for the input images.
- Pose Refinement and Diffusion Model TTT: Enhancing the pose estimation with refinement and personalizing diffusion model.
- 3D Reconstruction: Reconstruct the 3D object based on estimated poses and finetuned diffusion model.
- Memory Considerations: To ensure a smooth operation, your system should have at least 38GB of available memory.
- Configuring the Working Directory: Please set your
ROOT_DIR
as environment variable before launching the pipeline using command likeecho 'export ROOT_DIR=Your_ROOT_DIR' >> ~/.bashrc
.
Reconstructing Individual Objects: To process a specific object, kindly use the command below:
sh run_pipeline.sh GSO_demo OBJECT_NAME INPUT_VIEWS GPU_INDEX
For instance:
sh run_pipeline.sh GSO_demo Crosley_Alarm_Clock_Vintage_Metal 5 0
Batch Processing:
To execute the pipeline for all examples in the dataset/data/train/GSO_demo
directory, please run:
python run_pipeline.py --object_type GSO_demo
Our process yields comprehensive data sets, stored and accessible as follows:
- 2D NVS Outputs: Accessible in the directory
camerabooth/experiments_nvs/GSO_demo
. - 3D NVS Outputs: Found within folders named similarly to
3D_Recon/threestudio/experiments_GSO_demo_view_5_nerf
. - Evaluation Metrics: Quantitative results are comprehensively stored in the
results
folder.
In our commitment to replicability and transparency, we have assembled a detailed repository of results for all test objects within results_standard/GSO_demo
. Recognizing the considerable computational demand required (8 A100 GPUs across 1-2 days), we pragmatically suggest the processing of a selective subset of the data. This approach is designed to both confirm your system’s configuration and permit a meaningful, comparative analysis of the results.
To generate the tables for better visualize the numbers for different settings, run:
python results_standard/run/summarize.py
For using gradio interface to easily reconstruct in the wild objects, you could run gradio demo/sap3d/app.py
. (This would take up to an hour to get the results)
If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.
@inproceedings{han2024more,
title={The More You See in 2D the More You Perceive in 3D},
author={Han, Xinyang and Gao, Zelin and Kanazawa, Angjoo and Goel, Shubham and Gandelsman, Yossi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={20912--20922},
year={2024}
}