Skip to content

Commit 50b75ef

Browse files
committed
docs: update mkdocs yml, add source refs to models
1 parent be38bc7 commit 50b75ef

File tree

3 files changed

+30
-17
lines changed

3 files changed

+30
-17
lines changed

docs/speech_to_speech/agents/asr.md

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -57,23 +57,6 @@ Adds a custom VAD model to a processing pipeline.
5757

5858
The `'stop'` pipeline is present for forward compatibility. It currently doesn't affect Agent's functioning.
5959

60-
### `_on_new_sample()`
61-
62-
Callback function triggered for each new audio sample. Determines:
63-
64-
- If recording should start
65-
- Whether to continue buffering
66-
- If grace period has ended
67-
- When to start transcription threads
68-
69-
### `_transcription_thread(identifier)`
70-
71-
Handles transcription for a given buffer in a background thread. Uses locks to ensure safe access to transcription model.
72-
73-
### `_should_record(audio_data, input_parameters)`
74-
75-
Evaluates the `should_record_pipeline` models to determine if recording should begin.
76-
7760
## Best Practices
7861

7962
1. **Graceful Shutdown**: Always call `stop()` to ensure transcription threads complete.

docs/speech_to_speech/models/overview.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,35 +42,59 @@ This method takes raw audio data encoded as 2-byte integers and returns the corr
4242
- No additional setup required
4343
- Returns a confidence value indicating the presence of speech in the audio
4444

45+
??? info "SileroVAD"
46+
47+
::: rai_s2s.asr.models.silero_vad.SileroVAD
48+
4549
### OpenWakeWord
4650

4751
- Open source project: [GitHub](https://github.com/dscripka/openWakeWord)
4852
- Supports predefined and custom wake words
4953
- Returns `True` when the specified wake word is detected in the audio
5054

55+
??? info "OpenWakeWord"
56+
57+
::: rai_s2s.asr.models.open_wake_word.OpenWakeWord
58+
5159
### OpenAIWhisper
5260

5361
- Cloud-based transcription model: [Documentation](https://platform.openai.com/docs/guides/speech-to-text)
5462
- Requires setting the `OPEN_API_KEY` environment variable
5563
- Offers language and model customization via the API
5664

65+
??? info "OpenAIWhisper"
66+
67+
::: rai_s2s.asr.models.open_ai_whisper.OpenAIWhisper
68+
5769
### LocalWhisper
5870

5971
- Local deployment of OpenAI Whisper: [GitHub](https://github.com/openai/whisper)
6072
- Supports GPU acceleration
6173
- Same configuration interface as OpenAIWhisper
6274

75+
??? info "LocalWhisper"
76+
77+
::: rai_s2s.asr.models.local_whisper.LocalWhisper
78+
6379
### FasterWhisper
6480

6581
- Optimized Whisper variant: [GitHub](https://github.com/SYSTRAN/faster-whisper)
6682
- Designed for high speed and low memory usage
6783
- Follows the same API as Whisper models
6884

85+
??? info "FasterWhisper"
86+
87+
::: rai_s2s.asr.models.local_whisper.FasterWhisper
88+
6989
### ElevenLabs
7090

7191
- Cloud-based TTS model: [Website](https://elevenlabs.io/)
7292
- Requires the environment variable `ELEVENLABS_API_KEY` with a valid key
7393

94+
??? info "ElevenLabs"
95+
96+
::: rai_s2s.tts.models.elevenlabs_tts.ElevenLabsTTS
97+
7498
### OpenTTS
7599

76100
- Open source TTS solution: [GitHub](https://github.com/synesthesiam/opentts)
@@ -83,6 +107,10 @@ This method takes raw audio data encoded as 2-byte integers and returns the corr
83107
- Provides a TTS server running on port 5500
84108
- Supports multiple voices and configurations
85109

110+
??? info "OpenTTS"
111+
112+
::: rai_s2s.tts.models.open_tts.OpenTTS
113+
86114
## Custom Models
87115

88116
### Voice Detection Models

mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,8 @@ nav:
120120
- Overview: speech_to_speech/overview.md
121121
- Agents:
122122
- Overview: speech_to_speech/agents/overview.md
123+
- Automatic Speech Recognition: speech_to_speech/agents/asr.md
124+
- Text To Speech: speech_to_speech/agents/tts.md
123125
- Models:
124126
- Overview: speech_to_speech/models/overview.md
125127
- SoundDevice Connector: speech_to_speech/sounddevice.md

0 commit comments

Comments
 (0)