Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Auralis optimisations of XTTS #181

Open
erew123 opened this issue Nov 30, 2024 · 2 comments
Open

[Feature request] Auralis optimisations of XTTS #181

erew123 opened this issue Nov 30, 2024 · 2 comments

Comments

@erew123
Copy link

erew123 commented Nov 30, 2024

🚀 Feature Description

Hi @eginhard. Hope you are keeping well!

Its erew123 from AllTalk

Someone has pointed this out to me: https://www.astramind.ai/post/auralis and I think this is the GitHub https://github.com/astramind-ai/Auralis

Its a little beyond my pay grade, but maybe its of interest to the Coqui scripts. I dont know if you have seen this, or the author is posting on here with you, but I thought you may like to see it:

Firing their document into AI for a quick "Here is what they claim"


The author claims to have optimized XTTS-v2, a text-to-speech model, making it faster, more resource-efficient, asynchronous, and safer for production environments. Here are the key points and the performance gains:

What They Did

  1. Understanding the Code and Challenges:

    • Overcame a lack of prior experience in audio tech.
    • Debugged and worked around outdated dependencies and repos.
  2. Tokenizer Optimization:

    • Replaced a custom tokenizer with a Hugging Face-compatible FastPreTrainedTokenizer.
    • Improved token splitting logic to maintain audio quality while handling memory-efficient truncation.
  3. Model Reorganization:

    • Refactored the original architecture, which used GPT-2-like models and a HiFi-GAN vocoder, to eliminate unnecessary computations during inference.
    • Optimized the HiFi-GAN component to use in-place operations, drastically reducing memory usage.
  4. Integration of vLLM for GPT-2:

    • Overcame challenges in adapting vLLM for multimodal GPT-2, including token cache management and continuous batching.
    • Addressed vLLM's limitations on repetition penalties and hidden state collection, customizing its behavior for audio-specific tasks.
  5. Asynchronous Execution:

    • Made components non-blocking using asyncio.
  6. Optimized Workflow:

    • Avoided redundant token and embedding calculations during iterative decoding.
    • Adapted position ID tracking to align with unique conditioning inputs for multimodal tasks.

Performance Gains

  1. Speed:

    • Leveraging vLLM and deduplicating computations significantly reduced inference time.
  2. Resource Efficiency:

    • Memory consumption was slashed by optimizing HiFi-GAN for inference.
    • Reduced overhead by restructuring the GPT-2 and conditioning modules.
  3. Production Suitability:

    • Ensured asynchronous, non-blocking execution for smoother integration into UI frameworks like Pulsar.
    • Increased safety by moving from .pth to safer formats and handling positional encoding appropriately.
  4. Accessibility:

    • Made the enhancements available to the open-source community for broader adoption.

The overall result is a production-ready, optimized XTTS-v2 that is significantly faster and more memory-efficient, with asynchronous capabilities enabling smoother integration into applications.


Thanks Erew123

@eginhard
Copy link
Member

eginhard commented Dec 1, 2024

Thanks for sharing! I wasn't aware of it. Will take a look.

@erew123
Copy link
Author

erew123 commented Dec 1, 2024

@eginhard FYI the requirements they have bump Pytorch to 2.5.1... I have no idea if they are actually using something from that version of PyTorch. Just so you are aware. Thought it interesting though!

Thanks

@eginhard eginhard changed the title [Feature request] [Feature request] Auralis optimisations of XTTS Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants