Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data
- 2025-04-01: Presentation slides are now available for download.
- 2025-03-27: The paper is now available on Arxiv.
- 2025-03-03: Gradio and HuggingFace Demos are available.
- 2025-02-27: TriplaneTurbo is accepted to CVPR 2025.
- Fast Inference 🚀: Our code excels in inference efficiency, capable of outputting textured mesh in around 1 second.
- Text Comprehension 🆙: It demonstrates strong understanding capabilities for complex text prompts, ensuring accurate generation according to the input.
- 3D-Data-Free Training 🙅♂️: The entire training process doesn't rely on any 3D datasets, making it more resource-friendly and adaptable.
If you only wish to set up the demo locally, use the following code for the inference. Otherwise, for training and evaluation, use the next section of instructions for environment setup.
python -m venv venv
source venv/bin/activate
bash setup.sh
python gradio_app.pyCreate a virtual environment:
conda create -n triplaneturbo python=3.10
conda activate triplaneturbo
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia(Optional, Recommended) Install xFormers for attention acceleration:
conda install xFormers -c xFormers(Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions
pip install ninjaInstall major dependencies
pip install -r requirements.txtInstall iNGP
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torchIf you encounter errors while installing iNGP, it is recommended to check your gcc version. Follow these steps to change the gcc version within your -cconda environment. After that, return to the project directory and reinstall iNGP and NerfAcc:
conda install -c conda-forge gxx=9.5.0
cd $CONDA_PREFIX/lib
ln -s /usr/lib/x86_64-linux-gnu/libcuda.so ./
cd <your project directory>If you only want to run the evaluation without training, follow these steps:
# Download the model from HuggingFace
huggingface-cli download --resume-download ZhiyuanthePony/TriplaneTurbo \
--include "triplane_turbo_sd_v1.pth" \
--local-dir ./pretrained \
--local-dir-use-symlinks False
# Download evaluation assets
python scripts/prepare/download_eval_only.py
# Run evaluation script
bash scripts/eval/dreamfusion.sh --gpu 0,1 # You can use more GPUs (e.g. 0,1,2,3,4,5,6,7). For single GPU usage, please check the script for required modificationsOur evaluation metrics include:
- CLIP Similarity Score
- CLIP Recall@1
For detailed evaluation results, please refer to our paper.
If you want to evaluate your own model, use the following script:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python launch.py \
--config <path_to_your_exp_config> \
--export \
system.exporter_type="multiprompt-mesh-exporter" \
resume=<path_to_your_ckpt> \
data.prompt_library="dreamfusion_415_prompt_library" \
system.exporter.fmt=objAfter running the script, you will find generated OBJ files in outputs/<your_exp>/dreamfusion_415_prompt_library/save/<itXXXXX-export>. Set this path as <OBJ_DIR>, and set outputs/<your_exp>/dreamfusion_415_prompt_library/save/<itXXXXX-4views> as <VIEW_DIR>. Then run:
SAVE_DIR=<VIEW_DIR>
python evaluation/mesh_visualize.py \
<OBJ_DIR> \
--save_dir $SAVE_DIR \
--gpu 0,1,2,3,4,5,6,7
python evaluation/clipscore/compute.py \
--result_dir $SAVE_DIRThe evaluation results will be displayed in your terminal once the computation is complete.
Use the provided download script to get all necessary files:
python scripts/prepare/download_full.pyThis will download:
- Stable Diffusion 2.1 Base
- Stable Diffusion 1.5
- MVDream 4-view checkpoint
- RichDreamer checkpoint
- Text prompt datasets (3DTopia and DALLE+Midjourney)
# Single GPU
CUDA_VISIBLE_DEVICES=0 python launch.py \
--config configs/TriplaneTurbo_v0_acc-2.yaml \
--train \
data.prompt_library="3DTopia_prompt_library" \
data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia" \
data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia"For multi-GPU training:
# 8 GPUs with 48GB+ memory each
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python launch.py \
--config configs/TriplaneTurbo_v1_acc-2.yaml \
--train \
data.prompt_library="3DTopia_361k_prompt_library" \
data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia" \
data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia"Choose the appropriate command based on your GPU configuration:
# Single GPU
CUDA_VISIBLE_DEVICES=0 python launch.py \
--config configs/TriplaneTurbo_v0_acc-2.yaml \
--train \
data.prompt_library="DALLE_Midjourney_prompt_library" \
data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ" \
data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ"For multi-GPU training (higher performance):
# 8 GPUs with 48GB+ memory each
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python launch.py \
--config configs/TriplaneTurbo_v1_acc-2.yaml \
--train \
data.prompt_library="DALLE_Midjourney_prompt_library" \
data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ" \
data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ"-
Memory Requirements:
- v1 configuration: Requires GPUs with 48GB+ memory
- v0 configuration: Works with GPUs that have less memory (46GB+) but with reduced performance
-
Acceleration Options:
- Use
_acc-2.yamlconfigs for gradient accumulation to reduce memory usage
- Use
-
Advanced Options:
- For highest quality, use
configs/TriplaneTurbo_v1.yamlwithsystem.parallel_guidance=true(requires 98GB+ memory GPUs) - To disable certain guidance components: add
guidance.rd_weight=0 guidance.sd_weight=0to the command
- For highest quality, use
If you find this work helpful, please consider citing our paper:
@article{ma2025progressive,
title={Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data},
author={Ma, Zhiyuan and Liang, Xinyue and Wu, Rongyuan and Zhu, Xiangyu and Lei, Zhen and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
year={2025}
}
Our code is heavily based on the following works
- ThreeStudio: A clean and extensible codebase for 3D generation via Score Distillation.
- MVDream: Used as one of our multi - view teachers.
- RichDreamer: Serves as another multi - view teacher for normal and depth supervision
- 3DTopia: Its text caption dataset is applied in our training and comparison.
- DiffMC: Our solution uses its differentiable marching cube for mesh rasterization.
- NeuS: We implement its SDF - based volume rendering for dual rendering in our solution
