Allow stt suppression by vad #3961
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As discussed in #3918 some STTs may misfire creating false speech recognition. To fix this @chenghao-mou suggested rewriting
StreamAdapterto be able to work with stream capable STT's and only send STT's events ifVADEvent.START_OF_SPEECHwas generated.Implementation is a bit different from the suggestion:
push_frame()to both STT and VAD streams instead of starting to send to STT after according VAD events. Decided to go this route, since some STTs may cache previously received frames to improve prediction, and sending after VAD event is guaranteed "eat" at least one chunk with user speech. Don't wanna spoil accuracy at allstt._recognize_impl()to getNotImplementedErrorfor stream only STTs, we check forstt.capabilitiesand new parameter calledforce_streaminStreamAdapter. Calling thestt._recognize_impl()if it is implemented, will introduce unnecessary API call, and since it will be done during initialization will affect performance. It's also better to let user decide which mode he'd like to useBy default
force_stream=Falsein bothStreamAdapterandStreamAdapterWrapperfor backward compatibility. Unless set toTrueit will be using old logic, so no harm will be done to anyone who was already counting on it.