-
Notifications
You must be signed in to change notification settings - Fork 543
Open
Description
Describe the bug
When launching the ACE-Step-1.5 backend with the 4B model and CPU offloading enabled, the application immediately consumes ~23.5GB of VRAM (on an RTX 4090) upon startup. This occurs effectively at an idle state, before any generation tasks have been initiated. The offload_to_cpu setting appears to be ignored or ineffective, as the model seems to be fully loaded into GPU memory, nearly maxing out the 24GB capacity.
To Reproduce
Steps to reproduce the behavior:
- Configure the backend (
acestep/gpu_config.pyor via launch arguments) with:- LM Backend: PT
- LM Model: acestep-5Hz-lm-4B
- offload_to_cpu: true
- Start the application (e.g.,
uv run acesteporstart_gradio_ui.bat). - Open Windows Task Manager and navigate to the Performance tab for the GPU.
- Observe that Dedicated GPU memory usage stays at ~23.5 GB even while the system is idle.
Expected behavior
With offload_to_cpu = true, VRAM usage at idle should be significantly lower.
Screenshots
Desktop (please complete the following information):
- OS: Windows 11
- GPU: NVIDIA GeForce RTX 4090 (24GB)
- Python Version: 3.11
- PyTorch Version: 2.7.1 (CUDA 12.8)
Additional context
- The issue occurs immediately upon model initialization at startup.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels