MusicGen
so-vits-svc fork with realtime support, improved interface and more features.
The PyTorch-based audio source separation toolkit for researchers
ModelScope: bring the notion of Model-as-a-Service to life.
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
A simple GUI application that slices audio with silence detection
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Easily train a good VC model with voice data <= 10 mins!
The official Python API for ElevenLabs Text to Speech.
🔊 Text-Prompted Generative Audio Model
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)
An easy to understand TTS / SVS / SVC framework
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
A webui for different audio related Neural Networks
A collection of pre-trained, state-of-the-art models in the ONNX format
A WebUI to create song covers with any RVC v2 trained AI voice from YouTube videos or audio files.
DiffSinger dataset processing tools, including audio processing, labeling.
A collection of neural vocoders suitable for singing voice synthesis tasks.
A simple, high-quality voice conversion tool focused on ease of use and performance.
Versatile AI-driven audio upscaler to enhance the quality of any audio.