Skip to content

Conversation

@varghesepaul
Copy link

@varghesepaul varghesepaul commented Nov 16, 2025

Summary

Adds support for ElevenLabs Scribe v2 Realtime streaming STT with ~150ms latency.

Features

  • WebSocket-based streaming transcription
  • Configurable commit strategies (VAD/manual)
  • Word-level timestamp support
  • Automatic reconnection handling
  • Comprehensive error handling

API Options

  • model_id: Model selection (default: scribe_v2_realtime)
  • language_code: Language support (optional)
  • commit_strategy: "vad" (default) or "manual"
  • include_timestamps: Enable word-level timestamps
  • VAD parameters: threshold, silence duration, speech duration

Implementation Details

  • Follows Deepgram STTv2 pattern for consistency
  • Uses RecognizeStream base class (modern API)
  • Proper usage tracking via RECOGNITION_USAGE events
  • Session override support via update_options()

Known Issues

ElevenLabs API currently returns duplicate transcripts in some scenarios. I've reported this to ElevenLabs
(elevenlabs/elevenlabs-python#686). No explicit deduplication logic added as it risks removing valid repeated content.

Documentation

STT - Realtime : https://elevenlabs.io/docs/cookbooks/speech-to-text/streaming , https://elevenlabs.io/docs/api-reference/speech-to-text/v-1-speech-to-text-realtime

@CLAassistant
Copy link

CLAassistant commented Nov 16, 2025

CLA assistant check
All committers have signed the CLA.

@varghesepaul varghesepaul force-pushed the elevenlabs-scribeV2-realtime branch 2 times, most recently from 75f1045 to 5567d85 Compare November 17, 2025 01:52
- Add STTv2 class with full Scribe v2 Realtime API support
- Support word-level timestamps (include_timestamps parameter)
- Support both VAD and manual commit strategies
- Emit INTERIM_TRANSCRIPT events for real-time UI feedback
- Handle committed_transcript_with_timestamps events
- Add update_options() method for dynamic reconfiguration
- Comprehensive error handling and logging
- Full docstrings with examples
@varghesepaul varghesepaul force-pushed the elevenlabs-scribeV2-realtime branch from 5567d85 to e976fb8 Compare November 17, 2025 01:59
@varghesepaul
Copy link
Author

ElevenLabs has fixed the issue elevenlabs/elevenlabs-python#686, and the latest test results look good.

audio_format: STTAudioFormat = "pcm_16000",
commit_strategy: str = "vad",
include_timestamps: bool = False,
vad_silence_threshold_secs: float = 1.5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the default value from 11labs, can we use NOT_GIVEN as default? also, during my testing, the transcripts didn't get committed after 1.5s, does that related to the configuration?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this ended up being a config issue.

  • ElevenLabs’ default commit_strategy="manual" basically turns off all the VAD settings. So when we tested with the API defaults, vad_silence_threshold_secs wasn’t doing anything — which is why transcripts weren’t auto-committing.

  • The PR fixes that by switching the default to commit_strategy="vad", which brings back the 1.5s auto-commit behavior.

  • We tried a few settings in prod to make things snappier, especially for quick phrases like “Hello?”: vad_silence_threshold_secs = 0.6 (much faster than the 1.5s default)
    min_silence_duration_ms = 150 (more stable than 100ms)

The old 1.5s threshold was causing single-word phrases to stall for up to 22 seconds because background noise blocked clean silence detection. Dropping it to 0.6s fixed the lag without hurting accuracy.

Also — should we keep the default at 0.6 in the PR? We can either match ElevenLabs’ default, use NOT_GIVEN, or just go with 0.6 since that aligns with the vad strategy and seems to work best in practice. I’m leaning toward setting it to 0.6 by default.

Our stt config:

stt_instance = elevenlabs.STTv2(
model_id="scribe_v2_realtime",
vad_silence_threshold_secs=.6,
vad_threshold=.4,
min_silence_duration_ms=150,
)

Screenshot 2025-11-18 at 9 45 41 AM

@varghesepaul varghesepaul requested a review from longcw November 18, 2025 20:43
@varghesepaul varghesepaul force-pushed the elevenlabs-scribeV2-realtime branch from 14f8bb2 to 5d99999 Compare November 19, 2025 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants