-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AllTalk TTS and whisperX #12
Comments
This is absolutely awesome! Thanks so much, Ill get to work implementing this asap 🙇♂️ |
Couple more thoughts:
|
I just push an updated version with local whisper, thanks again for the help! No Alltalk integration yet though To your points:
Thanks again, super appreciate your help! |
Hi just pulled but can't find the config option for the local whisper
Also, it's saying Failed to start the recorder: module 'config' has no attribute 'TRANSCRIPTION_API'`` |
Edit your your config file and uncomment the following (and remove/comment out the
edit: Ah, I think I know what happend. You are probably still using your old config file. Go to the example config file, copy over the content to your config file and apply the above edits. |
Makes sense to set up some sort of TTL for the whisper model! Maybe even make it configurable in config.py? |
Yeah its kind of a pain that you need to re copy the config file whenever I add something new to the config example.. Ideally I could push new features to the config without you having to copy over the new config entries each time, not sure how is best to do that though. Yeah Ill add TTL to the todo list, at the end of the day it could be optional but it would be good to be able to do |
Anyone able to ELI5 this setup for me? I have the OG package setup, but I'm interested in trying WhisperX. Thanks. |
I have a bunch of changes in the works, hopefully by this time tomorrow they will all be merged, there will be a simplified setup process too. In the next couple days I'll also make a new video on how to set it up on windows and Linux |
Is the local whisper option only able to use the embedded whisperx model, or is there any way we can point to our own whisper integration? I have a whisper server that I'd love to use with this. |
I've actually recently removed whisperX in place for faster-whisper which is much more light weight and seems to have less dependency issues. So is your whisper server callable through like a network API? Could you give an example of how you could call it via code? I refactored the transcription system so it shouldn't be much work to set it up with a new transcription system |
Ah nice, yes it is very similar to faster-whisper. In fact I believe that was built off of this one, which is just the original (https://github.com/openai/whisper). It supports calls via web api, and also passing a .wav to the .exe which I think is what faster-whisper does. Honestly the only reason I wanted to use the local whisper is so I can point to my own model that I have already. Not that the whisper models are all that big, but I'm just hitting the point where I am finding I have soooo many AI models and any application that can point to an existing model is a wonderful thing. |
Good point, Im not as familiar with faster whisper but im sure there would be a way to optionally point to an existing model, ill add it to the todo list and see if i can find a way to set that up. Also side note, whisperX seems to have some dependencies that give fasterwhisper buggy results. So when you swap to faster whisper you will want to delete your venv and run through the setup again |
How do you edit the config to use the alltalk tts? I'm a bit lost on what 'relevant' settings I need to add |
I don't think the alltalk tts code got added so it won't be enough to edit the config. Either wait for support to be added or drop-in my code as replacement for one of the existing tts apis. |
Hey, I've introduced the following two modifications for my own use and figured you may want to take a look and see if it's something you'd like to implement. This is pretty crude and needs some refinement for sure but works. The following code is a drop-in replacement (you will probably want to add relevant config.py settings). The first snippet is for whisperX, the second one adds AllTalk TTS support. AllTalk TTS is a little bit more demanding than piper but offers way better voice quality. WhisperX lets you run this app 100% offline. With 12GB VRAM I'm running the tiny whisper model, a 7B/8B LLM (currently testing wizardlm2 and llama3 via Ollama) and my custom AllTalk model.
The latter snippet is not really an efficient solution as there is no need to copy the AllTalk generated wavs over to the AlwaysReddy audio_files directory. It would make more sense to change the AUDIO_FILE_DIR in config.py to point to the AllTalk output folder. Or change the output directory in AllTalk to point to AUDIO_FILE_DIR. If you think this may come in handy in any way, please feel free to use this code as you see fit.
The text was updated successfully, but these errors were encountered: