-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Last audio chunks are repeated twice. #48
Comments
Any update on this? |
Have the same issue here. |
So the issue is either in how the incoming audio bytes are processed. Like maybe when the audio size is x size then the chunk is repeated twice. But I doubt this. Cause if you ask the model to say "Hi, how can I help you today?" then it repeats it perfectly (and can many times) without the annoying "you today?" repetition. That leads to the only conclusion that it is a server issue. I guess we'll have to wait until the model will no longer be under "exp" label and Google will release the polished model. |
That makes sense. I was considering doing it the traditional way, using Deepgram and Azure, but if the non-experimental version of Gemini 2.0 is released soon, it might end up being a wasted effort this is released with fixes. When do you think it will be released, any idea? |
I guess in a "few weeks" haha. They said that on Twitter like a week (or two?) ago. As for the temporary fix, well maybe add an if statement to check if it is a first response and if the message is "Hi, how can I help you today?" then if the bytes are longer than expected cut the end? Yeah I think something like that could be made. Or use speech to text models (by sacrificing speed) to first convert the model's output to text and check for repeating the last words. |
Yes but there's also one thing i wanted to ask, were you able to input system instructions? It doesn't work either for me. |
It's in the Altair.tsx in components folder |
Facing the same issue here as well would be great to get this fixed. Love the Gemini model even compared to GPT-Realtime |
Description of the bug:
To replicate the bug, simply type "hi". The model will respond "Hello, how can I help you today?" and then "you today" after. It mostly happens at the start of the session but I noticed it happening in the middle too.
There seems to be an issue of how the audio chunks are being processed.
If you ask something like "How are you?" and the model responds with longer output thanking me, asking me, etc, (ie. threre are longer audio chunks) then it works fine.
Actual vs expected behavior:
No response
Any other information you'd like to share?
No response
The text was updated successfully, but these errors were encountered: