Feature/reference audio prompt #28

TheApeMachine · 2026-01-19T21:54:23Z

Implement reference audio
Automatically download OpenMuQ/MuQ-MuLan-large (or other compatible pre-trained model)
Add deterministic selection within audio prompt, or random

Notes: Not sure what your intentions are with the reference audio, but it seems to work pretty well as far as I can tell. I'm opening it as a pull request to see what your thoughts are, but feel free to ignore/close it if not interesting, I just wanted to experiment with it :) Also, there may be some small pieces of code in this branch that is related to an analysis harness I was working on to try and pinpoint the reason for the AI "shimmer" that seems to be common in music generation models, so apologies for that. Finally, I had to base this on my other pull request's branch, as I do not have a CUDA compatible machine here at the moment, so I can only work on Metal.

…e selection. Update argument handling in `run_music_generation.py` and improve `HeartMuLaGenPipeline` class for better input processing and model execution.

…odec model. Update `run_lyrics_transcription.py` to dynamically select device based on availability, and modify `HeartCodec` to determine device from input tensor or model parameters. Improve `HeartMuLaGenPipeline` to support autocast on MPS for better performance.

…mize audio token padding. Introduce a context manager for autocast that gracefully handles unsupported cases, and preallocate buffers for audio tokens to enhance performance during generation.

…ce on MPS. Update `pyproject.toml` to include the optimizer package directory. Enhance `HeartMuLaGenPipeline` to optionally enable Metal optimizations during model execution, improving performance for Llama blocks.

…w Metal kernels and Python wrappers. Update `pyproject.toml` to remove the optimizer package directory. Enhance runtime detection for Metal support and build tools availability.

…add optional dependencies for MuQ-MuLan. Modify `README.md` to reflect new Python version recommendations and installation instructions for optional features. Enhance `run_music_generation.py` and `HeartMuLaGenPipeline` to support reference audio conditioning and auto-download of MuQ-MuLan, improving music generation capabilities.

frink · 2026-01-25T22:17:35Z

Is this to try to figure out how to do style transfer one song to a new one?

Are you getting shimmer in this model too?

The problem in Suno appeared to be the 10hz generation rate and the overfit on the highs dues to a lot of music having rise and fall of pads in that band. The fix then is to move to 32hz codec space and using an RNN type network (Think RWKV-X) instead of straight transformers or diffusion. But that means NEW model architecture.

TheApeMachine · 2026-02-06T23:21:36Z

@frink Thanks for that information! This is something I didn't know about. And yes, that shimmer is still there. Though if you say "appeared" to be, suggesting that Suno solved it, we may be talking about a different shimmer. I am talking about a metallic sound that makes hi-hats and cymbals sound very unnatural, though it can be present in vocals and other instruments to. I always figured it has something to do with bitrate encoding artifacts sneaking in?

frink · 2026-02-08T17:36:07Z

@TheApeMachine that's a different issue. The way to solve that problem is to use something like AirWindows Average and sand off the highs then use a harmonic exciter to bring them back. The problem is that these codecs put noise above about 8-12KHz which makes everything up there sound like ice pics. So like I said you have to sand it off and then put it back yourself. I've used Amp Head plugins to excite harmonics and various other things. You can also blur and smooth with reverb and delay. But it's just ugly artifacts of the model and the way the AI works. That's not shimmer it's white noise with an low shelf set very high. It's also the biggest giveaway that the music is AI generated. So you have to polish it off if you want to pass things off for real.

TheApeMachine added 6 commits January 19, 2026 13:04

Refactor music generation pipeline to support dynamic device and dtyp…

c2fc45c

…e selection. Update argument handling in `run_music_generation.py` and improve `HeartMuLaGenPipeline` class for better input processing and model execution.

Refactor HeartMuLaGenPipeline to improve autocast handling and opti…

082f715

…mize audio token padding. Introduce a context manager for autocast that gracefully handles unsupported cases, and preallocate buffers for audio tokens to enhance performance during generation.

Implement Metal support for RMSNorm and RoPE operations, including ne…

b56ec87

…w Metal kernels and Python wrappers. Update `pyproject.toml` to remove the optimizer package directory. Enhance runtime detection for Metal support and build tools availability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/reference audio prompt #28

Feature/reference audio prompt #28

Uh oh!

TheApeMachine commented Jan 19, 2026

Uh oh!

frink commented Jan 25, 2026 •

edited

Loading

Uh oh!

TheApeMachine commented Feb 6, 2026

Uh oh!

frink commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feature/reference audio prompt #28

Are you sure you want to change the base?

Feature/reference audio prompt #28

Uh oh!

Conversation

TheApeMachine commented Jan 19, 2026

Uh oh!

frink commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheApeMachine commented Feb 6, 2026

Uh oh!

frink commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

frink commented Jan 25, 2026 •

edited

Loading