Last audio chunks are repeated twice. #48

ViaAnthroposBenevolentia · 2025-01-09T08:13:28Z

Description of the bug:

To replicate the bug, simply type "hi". The model will respond "Hello, how can I help you today?" and then "you today" after. It mostly happens at the start of the session but I noticed it happening in the middle too.

There seems to be an issue of how the audio chunks are being processed.
If you ask something like "How are you?" and the model responds with longer output thanking me, asking me, etc, (ie. threre are longer audio chunks) then it works fine.

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

Hemanth-TS · 2025-01-13T04:10:21Z

Any update on this?

herrkaefer · 2025-01-13T05:31:26Z

Have the same issue here.

ViaAnthroposBenevolentia · 2025-01-13T06:27:12Z

Any update on this?

So the issue is either in how the incoming audio bytes are processed. Like maybe when the audio size is x size then the chunk is repeated twice.

But I doubt this. Cause if you ask the model to say "Hi, how can I help you today?" then it repeats it perfectly (and can many times) without the annoying "you today?" repetition.

That leads to the only conclusion that it is a server issue. I guess we'll have to wait until the model will no longer be under "exp" label and Google will release the polished model.

Hemanth-TS · 2025-01-14T04:24:12Z

That makes sense. I was considering doing it the traditional way, using Deepgram and Azure, but if the non-experimental version of Gemini 2.0 is released soon, it might end up being a wasted effort this is released with fixes. When do you think it will be released, any idea?

ViaAnthroposBenevolentia · 2025-01-14T04:42:01Z

When do you think it will be released, any idea?

I guess in a "few weeks" haha. They said that on Twitter like a week (or two?) ago.

As for the temporary fix, well maybe add an if statement to check if it is a first response and if the message is "Hi, how can I help you today?" then if the bytes are longer than expected cut the end? Yeah I think something like that could be made. Or use speech to text models (by sacrificing speed) to first convert the model's output to text and check for repeating the last words.

Hemanth-TS · 2025-01-14T04:52:15Z

Yes but there's also one thing i wanted to ask, were you able to input system instructions? It doesn't work either for me.

ViaAnthroposBenevolentia · 2025-01-14T04:56:16Z

It's in the Altair.tsx in components folder

naveengovind · 2025-01-21T07:12:40Z

Facing the same issue here as well would be great to get this fixed. Love the Gemini model even compared to GPT-Realtime

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Last audio chunks are repeated twice. #48

Last audio chunks are repeated twice. #48

ViaAnthroposBenevolentia commented Jan 9, 2025

Hemanth-TS commented Jan 13, 2025

herrkaefer commented Jan 13, 2025

ViaAnthroposBenevolentia commented Jan 13, 2025

Hemanth-TS commented Jan 14, 2025

ViaAnthroposBenevolentia commented Jan 14, 2025 •

edited

Loading

Hemanth-TS commented Jan 14, 2025

ViaAnthroposBenevolentia commented Jan 14, 2025

naveengovind commented Jan 21, 2025 •

edited

Loading

Last audio chunks are repeated twice. #48

Last audio chunks are repeated twice. #48

Comments

ViaAnthroposBenevolentia commented Jan 9, 2025

Description of the bug:

Actual vs expected behavior:

Any other information you'd like to share?

Hemanth-TS commented Jan 13, 2025

herrkaefer commented Jan 13, 2025

ViaAnthroposBenevolentia commented Jan 13, 2025

Hemanth-TS commented Jan 14, 2025

ViaAnthroposBenevolentia commented Jan 14, 2025 • edited Loading

Hemanth-TS commented Jan 14, 2025

ViaAnthroposBenevolentia commented Jan 14, 2025

naveengovind commented Jan 21, 2025 • edited Loading

ViaAnthroposBenevolentia commented Jan 14, 2025 •

edited

Loading

naveengovind commented Jan 21, 2025 •

edited

Loading