Hang Guo, Zhaoyang Jia, Jiahao Li, Bin Li, Yuanhao Cai, Jiangshan Wang, Yawei Li, Yan Lu
Dummy Forcing is built on the observation that about 25% attention heads in existing autoregressive video diffusion models are "dummy", attending almost exclusively to the current frame despite access to historical context. Based on this observation, Dummy Forcing develops a technique to automatically identifies dummy heads and allocates varying context. Leveraging this "dummy property", we can enable 1. Efficient Video Generation at 24.3FPS real-time speed. 2. High-resolution Video Generation which supports 720P&1080P with 2.0x speedup. 3. Long-context Video Gneration to enlarge the context window by 6.58x without lossing efficiency.
⭐If this work is helpful for you, please help star this repo. Thanks!🤗
Our Dummy Forcing can generate videos at 24.3FPS speed, click to see the generation results below!
click to expand
|
Self Forcing (baseline) 17.5FPS
|
Dummy Forcing (ours) 24.3FPS
|
|
Self Forcing (baseline) 17.5FPS
|
Dummy Forcing (ours) 24.3FPS
|
|
Self Forcing (baseline) 17.5FPS
|
Dummy Forcing (ours) 24.3FPS
|
|
Self Forcing (baseline) 17.5FPS
|
Dummy Forcing (ours) 24.3FPS
|
|
Self Forcing (baseline) 17.5FPS
|
Dummy Forcing (ours) 24.3FPS
|
|
Self Forcing (baseline) 17.5FPS
|
Dummy Forcing (ours) 24.3FPS
|
|
Self Forcing (baseline) 17.5FPS
|
Dummy Forcing (ours) 24.3FPS
|
click to expand
|
Self Forcing (baseline) 17.5FPS
|
Dummy Forcing (ours) 24.3FPS
|
|
Self Forcing (baseline) 17.5FPS
|
Dummy Forcing (ours) 24.3FPS
|
|
Self Forcing (baseline) 17.5FPS
|
Dummy Forcing (ours) 24.3FPS
|
|
Self Forcing (baseline) 17.5FPS
|
Dummy Forcing (ours) 24.3FPS
|
1️⃣ About 25% Heads in existing autoregressive video diffusion models are ''Dummy''
2️⃣ Training-Free Efficient Video Generation (480P/720P/1080P, up to 2.0x speedup)
3️⃣ Enlarge Context Window w/o increasing overhead (6.58x longer context)
- 2026-01-28: arXiv paper available.
- 2026-01-29: We have open sourced all our code.
- 2026-03-28: Support new model: Causal-Forcing!
- arXiv version available
- Release all code
- Support TeaCache for more aggressive sppedup -> over 30FPS!
- Support Causal-Forcing for higher quality video generation!
NOTE! We have unified the Self-Forcing, LongLive, and Causal-Forcing video generation pipelines into this single repository, so you can flexibly switch between models by changing the configuration file :D
NOTE: At least 40GB GPU memory is needed.
Creat a conda environment and install dependencies: (we have tested our code under cuda13.0 on a H100, and one may modify the torch version for adaptation)
git clone https://github.com/cshguo/DummyForcing
cd ./DummyForcing
conda create -n dummyforcing python=3.10 -y
conda activate dummyforcing
pip install torch==2.9.0 torchvision==0.24.0 torchaudio==2.9.0 --index-url https://download.pytorch.org/whl/cu130
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
Download the pre-trained ckpt (each model will be saved in a separate folder under /path/to/pretrained):
cd /path/to/pretrained
bash /path/to/DummyForcing/hfd.sh Wan-AI/Wan2.1-T2V-1.3B
bash /path/to/DummyForcing/hfd.sh Efficient-Large-Model/LongLive-1.3B
bash /path/to/DummyForcing/hfd.sh gdhe17/Self-Forcing
After downloading the above ckpt, one may need to modify the ckpt path specifized in the config/longlive_inference.yaml and utils/wan_wrapper.py.
Example inference command with Self-Forcing model:
python inference.py --config_path configs/self_forcing_inference.yaml
Example inference command with LongLive model:
python inference.py --config_path configs/longlive_inference.yaml
Example inference command with Causal-Forcing model:
python inference.py --config_path configs/causal_forcing_inference.yaml
You can also modify the text prompts in the
./prompts/example_prompts.txtfor customization.
The generated videos should be stored in the ./videos file folder.
Gneration speed on our single H100 GPU:
Profiling results:
- Initialization/caching time: 2.47 ms (0.03%)
- Diffusion generation time: 3409.26 ms (43.05%)
- Block 0 generation time: 404.37 ms (11.86% of diffusion)
- Block 1 generation time: 487.63 ms (14.30% of diffusion)
- Block 2 generation time: 543.08 ms (15.93% of diffusion)
- Block 3 generation time: 492.40 ms (14.44% of diffusion)
- Block 4 generation time: 493.01 ms (14.46% of diffusion)
- Block 5 generation time: 494.33 ms (14.50% of diffusion)
- Block 6 generation time: 494.13 ms (14.49% of diffusion)
- VAE decoding time: 4507.72 ms (56.92%)
- Total time: 7919.44 ms
Each AR step generates 12 frames, so the generation speed is 24.3 frames/second. Including the VAE time, Dummy Forcing generates a 5s-long video in 8s.
For quantitative evaluation on VBench, one can run the following command:
# for self-forcing model
torchrun --nproc_per_node=1 --master_port=39500 sample_vbench.py --config_path configs/self_forcing_vbench.yaml
# for longlive model
torchrun --nproc_per_node=1 --master_port=29500 sample_vbench.py --config_path configs/longlive_vbench.yaml
# for causal-forcing model
torchrun --nproc_per_node=1 --master_port=29500 sample_vbench.py --config_path configs/causal_forcing_vbench.yaml
The above command will generate 5 videos per prompt, and all videos are saved in one folder for subsequent VBench eval. From my experience, the total time for VBench generation usually finish overnight!
After obtaining the generated videos, please see the official repo of VBench for evaluation details.
We also support the widely used teacache technique for further more aggressive sppedup (over 30FPS generation speed).
To enable teacache, change the teacache_enabled to true and the teacache will be automatically used in model forward. Generation speed with teacache at our end:
Profiling results:
- Initialization/caching time: 2.78 ms (0.04%)
- Diffusion generation time: 2749.15 ms (38.58%)
- Block 0 generation time: 326.90 ms (11.89% of diffusion)
- Block 1 generation time: 389.67 ms (14.17% of diffusion)
- Block 2 generation time: 458.02 ms (16.66% of diffusion)
- Block 3 generation time: 394.99 ms (14.37% of diffusion)
- Block 4 generation time: 392.39 ms (14.27% of diffusion)
- Block 5 generation time: 393.77 ms (14.32% of diffusion)
- Block 6 generation time: 393.06 ms (14.30% of diffusion)
- VAE decoding time: 4373.46 ms (61.38%)
- Total time: 7125.39 ms
To generate a 30s long video, one can modify the num_output_frames params in the ./configs/longlive_inference.yaml to 120 which will generate ~480 frames.
After this modification, run the command below:
python inference.py --config_path configs/longlive_inference.yaml
Note that Self-forcing is not specially trained for long video, so the performance may not as good as LongLive. However, one can also do the same thing above to test on Self Forcing model.
python interactive_inference.py --config_file configs/longlive_interactive_inference.yaml
The results will be saved in ./interactive_videos.
You can also modify the interactive prompts in prompts/interactive_example.jsonl to generate other story telling videos.
The high-resolution video generation, e.g., 720P and 1080P, can be simply achieved by changing the shape of initial Gaussian noise.
In detail, change the resolution parameter in the longlive_inference.yaml or self_forcing_inference.yaml to 720 or 1080 to allow high-resolution video generation!
For example, for 720P video generation, after changing the resolution, one can run:
python inference.py --config_path configs/self_forcing_inference.yaml
python inference.py --config_path configs/longlive_inference.yaml
Please cite us if our work is useful for your research.
@article{guo2026efficient,
title={Efficient Autoregressive Video Diffusion with Dummy Head},
author={Guo, Hang and Jia, Zhaoyang and Li, Jiahao and Li, Bin and Cai, Yuanhao and Wang, Jiangshan and Li, Yawei and Lu, Yan},
journal={arXiv preprint arXiv:2601.20499},
year={2026}
}
Our code are under Apache-2.0 license. Users should also follow the license of the corresponding backbone models we use like Self-Forcing (Apache-2.0 license). LongLive (Apache-2.0 license) and Causal-Forcing(Apache-2.0 license).
If you have any questions during your reproduce, feel free to contact me at cshguo@gmail.com
























