-
Notifications
You must be signed in to change notification settings - Fork 1.9k
add use_realtime to elevenlabs stt and support scribe v2 realtime model #4041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| class VADOptions(TypedDict, total=False): | ||
| vad_silence_threshold_secs: float | None | ||
| """Silence threshold in seconds for VAD. Default to 1.5""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems kind of long for realtime? does it mean that it'll mark the end of speech after 1.5s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.5s is the default value from 11labs, and yeah I think it's too long. actually the server VAD there is very sensitive to background noise, it tends to never stop during my testing...
their default commit strategy is manual... so I think a proper way to use their realtime model is to combine it with a local VAD #4043
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so to be clear, they do not send final transcript until VAD is clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
| *, | ||
| language: NotGivenOr[str] = NOT_GIVEN, | ||
| conn_options: APIConnectOptions = DEFAULT_API_CONNECT_OPTIONS, | ||
| ) -> SpeechStream: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we throw here if use_realtime isn't set to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the stt.capabilities.streaming will be False if use_realtime isn't set, ideally the agent framework won't use stt.stream in this case
chenghao-mou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Tested locally. Though I did see a bunch of hallucinated transcripts.
clean up #3909
close #3881