Add inference bargein and examples #4032

chenghao-mou · 2025-11-20T11:40:25Z

New barge-in detector under inference
Two stream implementation:
- HTTP endpoints for working with hosted model
- WS for working with gateway proxy

Detailed spec can be found in the notion page

theomonnom

Nice work!

theomonnom · 2025-11-24T18:54:31Z

examples/bargein_server.py

Let's move this file to other

I will remove it before merging. I don't think the user will need it.

theomonnom · 2025-11-24T18:54:57Z

examples/voice_agents/basic_agent.py

    metrics,
    room_io,
 )
+from livekit.agents.inference.bargein import BargeinDetector


Can it just be:

Suggested change

from livekit.agents.inference.bargein import BargeinDetector

from livekit.agents.inference import BargeinDetector

theomonnom · 2025-11-24T18:57:02Z

examples/voice_agents/basic_agent.py

        # See more at https://docs.livekit.io/agents/build/turns
        turn_detection=MultilingualModel(),
        vad=ctx.proc.userdata["vad"],
+        bargein_detector=BargeinDetector(),


I like this pattern (it follows how we use STT/TTS in the inference gateway in our docs)

Suggested change

bargein_detector=BargeinDetector(),

bargein_detector=inference.BargeinDetector(),

Tho I'm also wondering if it should always be there by default.

livekit-agents/livekit/agents/bargein.py

theomonnom · 2025-11-24T19:12:12Z

livekit-agents/livekit/agents/voice/audio_recognition.py

+                # emit the preceding sentinel event immediately before this event
+                # assuming *only one* sentinel event could precede the current event
+                # ignore if the previous event is not a sentinel event
+                logger.debug(


I see that we're only re-emitting events inside _transcript_buffer when we receive a new STT event. Shouldn't we also re-emit it if we detect it was a bargin? The STT may not send us any additional events, so there is a case where we just ignore this buffered event.

Yes, I don't have a return in this elif branch. So that after-the-barge-in event will be processed normally.

What I mean is what if this scenario happens:

should_hold_event is True

we buffer the transcript

we detected it was indeed a barge-in

we never receive new transcripts from the STT

In this case it seems like we just lose the buffered transcript and we don't generate any new generation?

Great question! Here is a breakdown of what I am thinking when there is no new transcript:

Scenario 1: If there is no actual user speech/barge-in

This is a barge-in false positive. If resume_false_interruption is enabled, the audio will pause and then resume. If not enabled, it will interrupt with no new transcript.

Scenario 2: If there is actual user speech/barge-in but no new transcript

Case 1: This is an STT failure. But it should behave similarly to Scenario 1.

Case 2: The transcript for the barge-in comes before the inference is done

The typical audio needed for the model for a barge-in detection is around 300ms. This means we can lose the transcript for that 300ms window. I think it is very rare to have a true barge-in for just 300ms of speech and STT finishes transcribing that before it's detected.

But there is a non-zero probability of barge-in detection being late, in which case, we might lose more transcript. What I can do here is to re-emit transcripts back until the last speaking point (if we have the timing information), but it might include non-barge-in transcript if the user says something like "Right, right [bc], we should ...[barge-in]".

Okay, I have updated it so that it will flush either at the end of agent speech or at new STT events up to the last overlap speech start.

chenghao-mou force-pushed the chenghaomou/agt-2182-barge-in-detector-interface branch 2 times, most recently from 1fe7b1b to 783b91a Compare November 21, 2025 15:48

chenghao-mou changed the title ~~Add barge plugin and examples~~ Add inference bargein and examples Nov 21, 2025

chenghao-mou force-pushed the chenghaomou/agt-2182-barge-in-detector-interface branch from 783b91a to 3d9d0af Compare November 24, 2025 11:51

theomonnom reviewed Nov 24, 2025

View reviewed changes

chenghao-mou marked this pull request as ready for review November 24, 2025 21:10

chenghao-mou requested a review from a team November 24, 2025 21:11

chenghao-mou force-pushed the chenghaomou/agt-2182-barge-in-detector-interface branch 2 times, most recently from a2c1e7e to 1326c24 Compare November 27, 2025 11:19

chenghao-mou added 11 commits November 27, 2025 12:35

add barge plugin and examples

1755c73

fix types

5f52aea

fix types

97314a0

add websocket stream

e334561

move bargein detector to inference

46c2354

clean up comments

36b1bda

fix type issues

2a58281

fix type issues

df4891c

update created_at to ns

2664ea8

drop bargein base class

3e8da0a

re-emit from last overlap speech

bd0115d

chenghao-mou force-pushed the chenghaomou/agt-2182-barge-in-detector-interface branch from 2bc9cf5 to bd0115d Compare November 27, 2025 12:35

chenghao-mou added 2 commits November 27, 2025 13:18

fix duplicate flush calls

4e8a922

fix types

7b11b1d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add inference bargein and examples #4032

Add inference bargein and examples #4032

Uh oh!

chenghao-mou commented Nov 20, 2025 •

edited

Loading

Uh oh!

theomonnom left a comment

Uh oh!

theomonnom Nov 24, 2025

Uh oh!

chenghao-mou Nov 24, 2025

Uh oh!

theomonnom Nov 24, 2025

Uh oh!

theomonnom Nov 24, 2025

Uh oh!

Uh oh!

theomonnom Nov 24, 2025

Uh oh!

chenghao-mou Nov 24, 2025

Uh oh!

theomonnom Nov 27, 2025

Uh oh!

chenghao-mou Nov 27, 2025

Uh oh!

chenghao-mou Nov 27, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	from livekit.agents.inference.bargein import BargeinDetector
	from livekit.agents.inference import BargeinDetector

	bargein_detector=BargeinDetector(),
	bargein_detector=inference.BargeinDetector(),

Add inference bargein and examples #4032

Are you sure you want to change the base?

Add inference bargein and examples #4032

Uh oh!

Conversation

chenghao-mou commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

theomonnom left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Scenario 1: If there is no actual user speech/barge-in

Scenario 2: If there is actual user speech/barge-in but no new transcript

Uh oh!

chenghao-mou Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chenghao-mou commented Nov 20, 2025 •

edited

Loading

chenghao-mou Nov 27, 2025 •

edited

Loading