snippet for voice detection in discord
- py-cord[voice]
A code snippet that took me too much time to admit. It detects when a user is speaking in a voice chat and generate a .wav file with the audio. When user stops speaking for 0.5 sec it creates the audio file and resets to start a new recoding.
go to a vc channel /join /start when done /stop /leave
- Bot will start a recording on the voice channel (/join & /start)
- Recording goes to a sink
- The sink contains user and audio data
- We then check if the audio data is present
- Then every 0.5 sec we check the size of the audio data, if it increases then user is speaking, if it stays the same then user stopped speaking
- When user stops speaking we dump the data to a file and reset the audio in the sink to start fresh
- YOU DONT NEED /stop ! recording is automatically dumped when user stops speaking that is the main reason for creating this code
- /stop stops the whole process
- retrieve data from the sink : for user_id, audio_data in vc.sink.audio_data.items()
- get audio size : curr_file_size = audio_data.file.tell()
- extract the data audio_data.file.seek(0) & data_to_write = audio_data.file.read()
- reset data in the sink : audio_data.file = io.BytesIO()
First step towards voice assistant discord.
- add a call to an "api" to indicate that a file has been generated
- convert to audio file to text using whisper (probably fast-whisper or something even faster)
- (optionnal) detect word using porcupine or just parse text to detect commands or keywork like "remember this :" to save it to a RAG
- once transcribe send the text to an LLM (mistral or llama3, small model for fast inference and injecting prompt for conversationnal behavior)
- grab the response from the LLM and convert it back to audio using piper or similar
- send the audio to the bot to play it
- (target : the whole trip would take less than a second)