-
Notifications
You must be signed in to change notification settings - Fork 2k
STT: ElevenLabs STTv2 (Scribe v2 Realtime) support #3954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
STT: ElevenLabs STTv2 (Scribe v2 Realtime) support #3954
Conversation
75f1045 to
5567d85
Compare
- Add STTv2 class with full Scribe v2 Realtime API support - Support word-level timestamps (include_timestamps parameter) - Support both VAD and manual commit strategies - Emit INTERIM_TRANSCRIPT events for real-time UI feedback - Handle committed_transcript_with_timestamps events - Add update_options() method for dynamic reconfiguration - Comprehensive error handling and logging - Full docstrings with examples
5567d85 to
e976fb8
Compare
|
ElevenLabs has fixed the issue elevenlabs/elevenlabs-python#686, and the latest test results look good. |
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt_v2.py
Outdated
Show resolved
Hide resolved
| audio_format: STTAudioFormat = "pcm_16000", | ||
| commit_strategy: str = "vad", | ||
| include_timestamps: bool = False, | ||
| vad_silence_threshold_secs: float = 1.5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the default value from 11labs, can we use NOT_GIVEN as default? also, during my testing, the transcripts didn't get committed after 1.5s, does that related to the configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this ended up being a config issue.
-
ElevenLabs’ default commit_strategy="manual" basically turns off all the VAD settings. So when we tested with the API defaults, vad_silence_threshold_secs wasn’t doing anything — which is why transcripts weren’t auto-committing.
-
The PR fixes that by switching the default to commit_strategy="vad", which brings back the 1.5s auto-commit behavior.
-
We tried a few settings in prod to make things snappier, especially for quick phrases like “Hello?”: vad_silence_threshold_secs = 0.6 (much faster than the 1.5s default)
min_silence_duration_ms = 150 (more stable than 100ms)
The old 1.5s threshold was causing single-word phrases to stall for up to 22 seconds because background noise blocked clean silence detection. Dropping it to 0.6s fixed the lag without hurting accuracy.
Also — should we keep the default at 0.6 in the PR? We can either match ElevenLabs’ default, use NOT_GIVEN, or just go with 0.6 since that aligns with the vad strategy and seems to work best in practice. I’m leaning toward setting it to 0.6 by default.
Our stt config:
stt_instance = elevenlabs.STTv2(
model_id="scribe_v2_realtime",
vad_silence_threshold_secs=.6,
vad_threshold=.4,
min_silence_duration_ms=150,
)
…venlabs/stt_v2.py Co-authored-by: Long Chen <[email protected]>
14f8bb2 to
5d99999
Compare
Summary
Adds support for ElevenLabs Scribe v2 Realtime streaming STT with ~150ms latency.
Features
API Options
model_id: Model selection (default: scribe_v2_realtime)language_code: Language support (optional)commit_strategy: "vad" (default) or "manual"include_timestamps: Enable word-level timestampsImplementation Details
Known Issues
ElevenLabs API currently returns duplicate transcripts in some scenarios. I've reported this to ElevenLabs
(elevenlabs/elevenlabs-python#686). No explicit deduplication logic added as it risks removing valid repeated content.
Documentation
STT - Realtime : https://elevenlabs.io/docs/cookbooks/speech-to-text/streaming , https://elevenlabs.io/docs/api-reference/speech-to-text/v-1-speech-to-text-realtime