Welcome to transcribeX, a platform for real-time audio transcription powered by state-of-the-art AI models and modern web technologies.
TranscribeX enables users to transcribe audio files efficiently using NVIDIA's CUDA technology, Transformers, and Flash Attention v2. This platform eliminates the reliance on third-party APIs by leveraging in-house models and infrastructure, providing robust performance and data privacy.
- Real-time Transcription: Transcribe audio files instantly with high accuracy.
- Customizable AI Models: Utilize NVIDIA CUDA and advanced Transformer models.
- Web Interface: User-friendly web interface for uploading and managing audio files.
- Data Privacy: Host and process data securely without relying on external services.
- Backend: Python, FastAPI, Modal framework
- Frontend: React, Tailwind CSS
- AI Models: Hugging Face Transformers, Flash Attention v2
- Infrastructure: NVIDIA CUDA, Docker
To run transcribeX locally:
-
Clone the repository:
git clone https://github.com/yourusername/transcribeX.git cd transcribeX
To deploy WhisperV3 backend on Modal:
-
Create a Virtual Environment:
python3 -m venv whisperenv source whisperenv/bin/activate
-
Install Dependencies:
pip3 install modal==0.62.181 fastapi==0.110.0
-
Configure Modal Credentials:
- Ensure your Modal credentials are set up correctly in your environment.
- Setup modal token from the modal dashboard and run in terminal:
modal token set --token-id <token-id> --token-secret <token-secret>
-
Deploy the Backend Server:
cd modal directory modal deploy modal_app.py
-
Start the Development Server:
npm or bun run dev
-
Access TranscribeX:
- The frontend will be accessible at
http://localhost:3000
.
- The frontend will be accessible at
- POST /transcribe: Upload an audio file for transcription.
- GET /stats: View real-time statistics on transcription operations.
- POST /call_id: Retrieve the status of a transcription task using its call ID.
Contributions are welcome! Please fork the repository and submit pull requests for any enhancements or bug fixes.
This project is licensed under the MIT License - see the LICENSE file for details.
Special thanks to the contributors and libraries that make WhisperV3 possible.
https://modal.com/docs/examples/hello_world https://github.com/katspaugh/wavesurfer.js