Inference fails with CUDA out of memory on long scripts

I tried to run the 1.5B multi-speaker model on a 12GB GPU using a script longer than 10 minutes.  

Expected behavior:  
- The model generates audio for the entire script.  

Actual behavior:  
- The process crashes with a CUDA Out of Memory (OOM) error after a few minutes.  

Steps to reproduce:  
1. Clone the VibeVoice repository.  
2. Install dependencies as per the instructions.  
3. Run inference with the 1.5B multi-speaker model on a script longer than 10 minutes.  

Environment:  
- OS: Ubuntu 22.04  
- Python: 3.11  
- CUDA: 12.1  
- GPU: NVIDIA RTX 3060 12GB  
- VibeVoice model: 1.5B multi-speaker  

Additional notes:  
- Reducing the script length allows the inference to succeed.  
- Suggestion: maybe implement memory optimization for long-form audio generation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference fails with CUDA out of memory on long scripts #157

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inference fails with CUDA out of memory on long scripts #157

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions