Skip to content

Access to Music VAE #11

@AdrLfv

Description

@AdrLfv

Hello,

Thank you for the excellent work on DiffRhythm 2 and for releasing the model weights!

We are researchers at VITA Lab (EPFL) exploring the use of DiffRhythm 2's DiT backbone.

We noticed that the released checkpoints include:
The DiT model (model.safetensors) from ASLP-lab/DiffRhythm2
The BigVGAN decoder (decoder.bin) from the same repo
The DiffRhythm v1 oobleck VAE (vae_model.pt) from ASLP-lab/DiffRhythm-vae

However, as described in Section 3.2 of the paper, the DiffRhythm 2 DiT was trained on latents from the v2 Music VAE encoder (24 kHz input, 4800× compression, 5 Hz frame rate), which is architecturally different from the v1 oobleck VAE (44.1 kHz, 2048× compression, ~21.5 Hz). The v2 Music VAE encoder does not appear to be included in the released checkpoints.

Would it be possible to release the Music VAE encoder checkpoint? We need it to produce the correct 5 fps latent representations that match what the DiT was pretrained on.

We would greatly appreciate any help with this. Thank you for your time!

Best regards,
Adrien Lefèvre

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions