Skip to content

Omar-Abuzaid-stack/kokoro-windows-gpu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

=====================================================
UNIFIED AI VOICE SERVER (TTS + STT) - WINDOWS RTX GPU
=====================================================

This folder contains the complete unified server designed to run BOTH Text-to-Speech (Kokoro) and Speech-to-Text (Whisper) on your Windows RTX 3060 simultaneously.

--- STEP 1: PREREQUISITES ---
1. You must have Python installed on your Windows PC. 
   - Go to python.org/downloads and download Python 3.10 or 3.11.
   - VERY IMPORTANT: Check the box that says "Add Python to PATH" during installation.

--- STEP 2: START THE SERVER ---
1. Double-click the `start_server.bat` file in this folder.
2. A black command prompt window will open. It will automatically:
   - Create a virtual environment
   - Install FastAPI, WebSockets, Kokoro, and Faster-Whisper.
   - Download the Kokoro AI models and the Faster-Whisper base.en model.
   - Start the server on Port 5000.
3. Wait until it says "Application startup complete."

--- STEP 3: EXPOSE TO THE INTERNET WITH NGROK ---
Because Vapi needs to stream audio over WebSockets to your PC, you need a secure tunnel.

1. Go to https://ngrok.com/download and download the Windows version.
2. Unzip it and double-click `ngrok.exe`.
3. In the new Ngrok window, type:
   ngrok http 5000
4. Press Enter. 
5. Ngrok will show a "Forwarding" URL (e.g., https://a1b2c3d4.ngrok-free.app)

--- STEP 4: UPDATE VAPI ---

FOR TEXT-TO-SPEECH (Custom Voice):
1. In your Vapi Dashboard, edit your Voice Provider to "Custom".
2. Set the Server URL to: https://[your-ngrok-url].ngrok-free.app/vapi/tts

FOR SPEECH-TO-TEXT (Custom Transcriber):
1. In your Vapi Dashboard, edit your Transcriber to "Custom".
2. Set the Server URL to your Ngrok websocket URL. IMPORTANT: You must change "https://" to "wss://"
   Example: wss://[your-ngrok-url].ngrok-free.app/vapi/stt

Your RTX 3060 is now powering BOTH the hearing (STT) and speaking (TTS) of your real estate agent in real time!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors