Skip to content

[Bug] High VRAM usage (23.5GB) at startup with acestep-5Hz-lm-4B despite offload_to_cpu = true #378

@devsynck

Description

@devsynck

Describe the bug
When launching the ACE-Step-1.5 backend with the 4B model and CPU offloading enabled, the application immediately consumes ~23.5GB of VRAM (on an RTX 4090) upon startup. This occurs effectively at an idle state, before any generation tasks have been initiated. The offload_to_cpu setting appears to be ignored or ineffective, as the model seems to be fully loaded into GPU memory, nearly maxing out the 24GB capacity.

To Reproduce
Steps to reproduce the behavior:

  1. Configure the backend (acestep/gpu_config.py or via launch arguments) with:
    • LM Backend: PT
    • LM Model: acestep-5Hz-lm-4B
    • offload_to_cpu: true
  2. Start the application (e.g., uv run acestep or start_gradio_ui.bat).
  3. Open Windows Task Manager and navigate to the Performance tab for the GPU.
  4. Observe that Dedicated GPU memory usage stays at ~23.5 GB even while the system is idle.

Expected behavior
With offload_to_cpu = true, VRAM usage at idle should be significantly lower.

Screenshots

Image

Desktop (please complete the following information):

  • OS: Windows 11
  • GPU: NVIDIA GeForce RTX 4090 (24GB)
  • Python Version: 3.11
  • PyTorch Version: 2.7.1 (CUDA 12.8)

Additional context

  • The issue occurs immediately upon model initialization at startup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions