[Bug] High VRAM usage (23.5GB) at startup with acestep-5Hz-lm-4B despite offload_to_cpu = true

**Describe the bug**
When launching the ACE-Step-1.5 backend with the 4B model and CPU offloading enabled, the application immediately consumes ~23.5GB of VRAM (on an RTX 4090) upon startup. This occurs effectively at an idle state, before any generation tasks have been initiated. The offload_to_cpu setting appears to be ignored or ineffective, as the model seems to be fully loaded into GPU memory, nearly maxing out the 24GB capacity.

**To Reproduce**
Steps to reproduce the behavior:
1. Configure the backend (`acestep/gpu_config.py` or via launch arguments) with:
   - **LM Backend:** PT
   - **LM Model:** acestep-5Hz-lm-4B
   - **offload_to_cpu:** true
2. Start the application (e.g., `uv run acestep` or `start_gradio_ui.bat`).
3. Open Windows Task Manager and navigate to the **Performance** tab for the GPU.
4. Observe that Dedicated GPU memory usage stays at ~23.5 GB even while the system is idle.

**Expected behavior**
With `offload_to_cpu = true`, VRAM usage at idle should be significantly lower.

**Screenshots**

<img width="1458" height="856" alt="Image" src="https://github.com/user-attachments/assets/f257cb40-7079-49ca-ba42-a7c03fe4c9a5" />

**Desktop (please complete the following information):**
- **OS:** Windows 11
 - **GPU:** NVIDIA GeForce RTX 4090 (24GB)
 - **Python Version:** 3.11
 - **PyTorch Version:** 2.7.1 (CUDA 12.8)

**Additional context**
* The issue occurs immediately upon model initialization at startup.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] High VRAM usage (23.5GB) at startup with acestep-5Hz-lm-4B despite offload_to_cpu = true #378

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] High VRAM usage (23.5GB) at startup with acestep-5Hz-lm-4B despite offload_to_cpu = true #378

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions