-
Notifications
You must be signed in to change notification settings - Fork 290
Feature/reference audio prompt #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature/reference audio prompt #28
Conversation
…e selection. Update argument handling in `run_music_generation.py` and improve `HeartMuLaGenPipeline` class for better input processing and model execution.
…odec model. Update `run_lyrics_transcription.py` to dynamically select device based on availability, and modify `HeartCodec` to determine device from input tensor or model parameters. Improve `HeartMuLaGenPipeline` to support autocast on MPS for better performance.
…mize audio token padding. Introduce a context manager for autocast that gracefully handles unsupported cases, and preallocate buffers for audio tokens to enhance performance during generation.
…ce on MPS. Update `pyproject.toml` to include the optimizer package directory. Enhance `HeartMuLaGenPipeline` to optionally enable Metal optimizations during model execution, improving performance for Llama blocks.
…w Metal kernels and Python wrappers. Update `pyproject.toml` to remove the optimizer package directory. Enhance runtime detection for Metal support and build tools availability.
…add optional dependencies for MuQ-MuLan. Modify `README.md` to reflect new Python version recommendations and installation instructions for optional features. Enhance `run_music_generation.py` and `HeartMuLaGenPipeline` to support reference audio conditioning and auto-download of MuQ-MuLan, improving music generation capabilities.
|
Is this to try to figure out how to do style transfer one song to a new one? Are you getting shimmer in this model too? The problem in Suno appeared to be the 10hz generation rate and the overfit on the highs dues to a lot of music having rise and fall of pads in that band. The fix then is to move to 32hz codec space and using an RNN type network (Think RWKV-X) instead of straight transformers or diffusion. But that means NEW model architecture. |
|
@frink Thanks for that information! This is something I didn't know about. And yes, that shimmer is still there. Though if you say "appeared" to be, suggesting that Suno solved it, we may be talking about a different shimmer. I am talking about a metallic sound that makes hi-hats and cymbals sound very unnatural, though it can be present in vocals and other instruments to. I always figured it has something to do with bitrate encoding artifacts sneaking in? |
|
@TheApeMachine that's a different issue. The way to solve that problem is to use something like AirWindows Average and sand off the highs then use a harmonic exciter to bring them back. The problem is that these codecs put noise above about 8-12KHz which makes everything up there sound like ice pics. So like I said you have to sand it off and then put it back yourself. I've used Amp Head plugins to excite harmonics and various other things. You can also blur and smooth with reverb and delay. But it's just ugly artifacts of the model and the way the AI works. That's not shimmer it's white noise with an low shelf set very high. It's also the biggest giveaway that the music is AI generated. So you have to polish it off if you want to pass things off for real. |
Notes: Not sure what your intentions are with the reference audio, but it seems to work pretty well as far as I can tell. I'm opening it as a pull request to see what your thoughts are, but feel free to ignore/close it if not interesting, I just wanted to experiment with it :) Also, there may be some small pieces of code in this branch that is related to an analysis harness I was working on to try and pinpoint the reason for the AI "shimmer" that seems to be common in music generation models, so apologies for that. Finally, I had to base this on my other pull request's branch, as I do not have a CUDA compatible machine here at the moment, so I can only work on Metal.