@@ -42,35 +42,59 @@ This method takes raw audio data encoded as 2-byte integers and returns the corr
42
42
- No additional setup required
43
43
- Returns a confidence value indicating the presence of speech in the audio
44
44
45
+ ??? info "SileroVAD"
46
+
47
+ ::: rai_s2s.asr.models.silero_vad.SileroVAD
48
+
45
49
### OpenWakeWord
46
50
47
51
- Open source project: [ GitHub] ( https://github.com/dscripka/openWakeWord )
48
52
- Supports predefined and custom wake words
49
53
- Returns ` True ` when the specified wake word is detected in the audio
50
54
55
+ ??? info "OpenWakeWord"
56
+
57
+ ::: rai_s2s.asr.models.open_wake_word.OpenWakeWord
58
+
51
59
### OpenAIWhisper
52
60
53
61
- Cloud-based transcription model: [ Documentation] ( https://platform.openai.com/docs/guides/speech-to-text )
54
62
- Requires setting the ` OPEN_API_KEY ` environment variable
55
63
- Offers language and model customization via the API
56
64
65
+ ??? info "OpenAIWhisper"
66
+
67
+ ::: rai_s2s.asr.models.open_ai_whisper.OpenAIWhisper
68
+
57
69
### LocalWhisper
58
70
59
71
- Local deployment of OpenAI Whisper: [ GitHub] ( https://github.com/openai/whisper )
60
72
- Supports GPU acceleration
61
73
- Same configuration interface as OpenAIWhisper
62
74
75
+ ??? info "LocalWhisper"
76
+
77
+ ::: rai_s2s.asr.models.local_whisper.LocalWhisper
78
+
63
79
### FasterWhisper
64
80
65
81
- Optimized Whisper variant: [ GitHub] ( https://github.com/SYSTRAN/faster-whisper )
66
82
- Designed for high speed and low memory usage
67
83
- Follows the same API as Whisper models
68
84
85
+ ??? info "FasterWhisper"
86
+
87
+ ::: rai_s2s.asr.models.local_whisper.FasterWhisper
88
+
69
89
### ElevenLabs
70
90
71
91
- Cloud-based TTS model: [ Website] ( https://elevenlabs.io/ )
72
92
- Requires the environment variable ` ELEVENLABS_API_KEY ` with a valid key
73
93
94
+ ??? info "ElevenLabs"
95
+
96
+ ::: rai_s2s.tts.models.elevenlabs_tts.ElevenLabsTTS
97
+
74
98
### OpenTTS
75
99
76
100
- Open source TTS solution: [ GitHub] ( https://github.com/synesthesiam/opentts )
@@ -83,6 +107,10 @@ This method takes raw audio data encoded as 2-byte integers and returns the corr
83
107
- Provides a TTS server running on port 5500
84
108
- Supports multiple voices and configurations
85
109
110
+ ??? info "OpenTTS"
111
+
112
+ ::: rai_s2s.tts.models.open_tts.OpenTTS
113
+
86
114
## Custom Models
87
115
88
116
### Voice Detection Models
0 commit comments