Open
Description
Model/Pipeline/Scheduler description
TorToise is a multi-voice text-to-speech system, which describes a way to apply recent advances in the image generative domain to speech synthesis. It would be great to have this model in diffusers.
I would love to contribute this.
Open source status
- The model implementation is available
- The model weights are available (Only relevant if addition is not a scheduler).
Provide useful links for the implementation
Paper - https://arxiv.org/pdf/2305.07243.pdf
Github repo - https://github.com/neonbjb/tortoise-tts