Having this available as a transformers-compatible model would vastly simplify usage of the model, reducing generation code to something roughly like:
from transformers import pipeline
import scipy.io.wavfile
synthesizer = pipeline("text-to-audio", model="ASLP-lab/DiffRhythm2")
lyrics = "This is a song about how great DiffRhythm 2 is..."
music = synthesizer(
"cinematic, stunning",
forward_params={"do_sample": True, "lyrics": lyrics},
)
scipy.io.wavfile.write("music.wav", rate=music["sampling_rate"], data=music["audio"])