[WIP] Fix NotImplementedError during model.to(device) #402

Copilot · 2026-02-09T20:00:07Z

Fix NotImplementedError with AffineQuantizedTensor on CUDA

Problem

After model quantization with torchao, moving the model to CUDA fails with NotImplementedError because direct .to(device) calls don't work with AffineQuantizedTensor on some torch versions.

Root Cause

The code uses direct .to(device) calls to move the model BEFORE quantization, but after torch.compile() and quantize_(), there's no explicit device movement using the safe _recursive_to_device() method that handles AffineQuantizedTensor properly.

Plan

Investigate the issue and understand the codebase
Identify root cause: direct .to() calls don't work with AffineQuantizedTensor
Replace direct .to(device) calls with _recursive_to_device() for model initialization
Ensure model is on correct device after quantization
Test the fix with quantized models
Run code review and security checks
Verify the solution works

Original prompt

This section details on the original issue you should resolve

<issue_title>BUG - commit #17825ee(?) (NotImplementedError related to AffineQuantizedTensor when attempting to move the quantized model to CUDA)</issue_title>
<issue_description>Describe the bug
After updating to the latest commit (7aa2737 - "Revert 'fix: implement platform-specific audio playback reset logic'"), music generation fails with a NotImplementedError related to AffineQuantizedTensor when attempting to move the quantized model to CUDA. The error occurs in torchao's quantization layer during the model.to(device) operation, specifically with the aten._has_compatible_shallow_copy_type operator not being implemented for AffineQuantizedTensor types.

Can confirm reverting to c3dcf14 resolves this issue. VERY sure the issue is introduced somewhere in 17825ee

To Reproduce
Steps to reproduce the behavior:

Fresh install of ACE-Step-1.5 on Ubuntu 24.04 with RTX 50 Series GPU (working ~8-10 hours ago)
Pull latest updates from repository (git pull)
Launch the Gradio UI
Attempt to generate music with any text prompt
See error: NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten._has_compatible_shallow_copy_type', overload='default')>

Expected behavior
Music generation should proceed normally as it did before the update. The model should successfully move to CUDA device and generate audio output.

Desktop (please complete the following information):

OS: Ubuntu 24.04
GPU: NVIDIA RTX 50 Series
Python: 3.11.11
Installation: Fresh install less than 10 hours ago, was working until git pull update

Additional context
The issue appears to have been introduced in commit #17825ee ("Merge mainline commits as of 2026/02/08 05:16 UTC with MPS optimizations and do optimization checks"). The installation was fully functional approximately 8 hours ago before pulling the latest updates. The error specifically occurs when torchao's AffineQuantizedTensor (used for model quantization) attempts to be moved to CUDA, suggesting a compatibility issue between the quantization implementation and PyTorch's device transfer mechanisms.

2026-02-08 09:44:21.118 | INFO     | acestep.handler:_load_model_context:880 - [_load_model_context] Offloaded vae to CPU in 0.1504s
2026-02-08 09:44:21.119 | ERROR    | acestep.handler:generate_music:3509 - [generate_music] Generation failed
Traceback (most recent call last):

  File "/home/ubuntuuser/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/threading.py", line 1002, in _bootstrap
    self._bootstrap_inner()
    │    └ <function Thread._bootstrap_inner at 0x7e0db48bf880>
    └ <WorkerThread(AnyIO worker thread, started daemon 138580555499200)>
  File "/home/ubuntuuser/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
    │    └ <function WorkerThread.run at 0x7e09cd6d4680>
    └ <WorkerThread(AnyIO worker thread, started daemon 138580555499200)>
  File "/media/ubuntuuser/Encrypted1/ACE-Step-1.5/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run
    result = context.run(func, *args)
             │       │   │      └ (<generator object setup_event_handlers.<locals>.generation_wrapper at 0x7e09d5e057e0>,)
             │       │   └ <function run_sync_iterator_async at 0x7e09d9805800>
             │       └ <method 'run' of '_contextvars.Context' objects>
             └ <_contextvars.Context object at 0x7e0960e66300>
  File "/media/ubuntuuser/Encrypted1/ACE-Step-1.5/.venv/lib/python3.11/site-packages/gradio/utils.py", line 835, in run_sync_iterator_async
    return next(iterator)
                └ <generator object setup_event_handlers.<locals>.generation_wrapper at 0x7e09d5e057e0>
  File "/media/ubuntuuser/Encrypted1/ACE-Step-1.5/.venv/lib/python3.11/site-packages/gradio/utils.py", line 1019, in gen_wrapper
    response = next(iterator)
                    └ <generator object setup_event_handlers.<locals>.generation_wrapper at 0x7e0960ec3ca0>

  File "/media/ubuntuuser/Encrypted1/ACE-Step-1.5/acestep/gradio_ui/events/__init__.py", line 539, in generation_wrapper
    yield from res_h.generate_with_batch_management(dit_handler, llm_handler, *args)
               │     │                              │            │             └ ('A brief, clean melodic phrase played on a bright, metallic mallet percussion instrument, resembling a xylophone or marimba....
               │     │                              │            └ <acestep.llm_inference.LLMHandler object at 0x7e09d785d650>
               │     │                              └ <acestep.handler.AceStepHandler object at 0x7e09d7a1ff50>
               │     └ ...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes ace-step/ACE-Step-1.5#334

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

Initial plan

9b658e3

Copilot AI assigned Copilot and ChuxiJ Feb 9, 2026

Copilot started work on behalf of ChuxiJ February 9, 2026 20:00 View session

ChuxiJ closed this Feb 9, 2026

Copilot AI requested a review from ChuxiJ February 9, 2026 20:02

Copilot stopped work on behalf of ChuxiJ due to an error February 9, 2026 20:02
The session was cancelled by the user.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix NotImplementedError during model.to(device) #402

[WIP] Fix NotImplementedError during model.to(device) #402

Uh oh!

Copilot AI commented Feb 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] Fix NotImplementedError during model.to(device) #402

[WIP] Fix NotImplementedError during model.to(device) #402

Uh oh!

Conversation

Copilot AI commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix NotImplementedError with AffineQuantizedTensor on CUDA

Problem

Root Cause

Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 9, 2026 •

edited

Loading