A cross-platform desktop application built with Tauri and Svelte that provides real-time speech-to-text transcription with live translation capabilities.
- Real-time Speech Recognition: Convert spoken words to text using OpenAI Whisper API
- Live Language Translation: Translate transcribed text to multiple languages using GPT models
- Voice Activity Detection (VAD): Smart audio processing with noise filtering and speech detection
- System Tray Integration: Always-on functionality with system tray controls
- Global Hotkeys: Configurable keyboard shortcuts for hands-free operation
- Multi-language Support: Support for 20+ languages for both input and output
- Cross-platform: Works on Windows, macOS, and Linux
- Secure API Key Storage: Encrypted storage of API credentials
- Audio Device Selection: Choose from available microphone inputs
- Customizable Themes: Light and dark mode support
- Debug Logging: Comprehensive logging for troubleshooting
TalkToMe is built using a modern hybrid architecture:
- Frontend: Svelte + TypeScript + TailwindCSS
- Backend: Rust with Tauri framework
- Audio Processing: CPAL (Cross-Platform Audio Library) with custom VAD
- APIs: OpenAI Whisper for STT, GPT models for translation
-
Audio Pipeline (
src-tauri/src/audio.rs)- Real-time audio capture using CPAL
- Voice Activity Detection with configurable thresholds
- Audio chunking with overlap handling
- Signal conditioning (high-pass filter, AGC, noise gate)
-
Speech-to-Text Service (
src-tauri/src/stt.rs)- OpenAI Whisper API integration
- Audio encoding to WAV format
- Retry logic and error handling
- Quality filtering for audio chunks
-
Translation Service (
src-tauri/src/translation.rs)- OpenAI GPT model integration
- Grammar correction and translation
- Support for 20+ languages
- Configurable translation models
-
Settings Management (
src-tauri/src/settings.rs)- Secure API key storage
- Portable data directory support
- Cross-platform configuration
-
Debug System (
src-tauri/src/debug_logger.rs)- Comprehensive logging system
- WAV dump capability for audio debugging
- Runtime log level control
- Node.js (v18 or later)
- Rust (latest stable)
- System Dependencies:
- Windows: Windows 10/11, WebView2
- macOS: macOS 10.13+
- Linux: WebKit2GTK, various system libraries
-
Clone the repository:
git clone https://github.com/bgeneto/TalkToMe.git cd TalkToMe -
Install dependencies:
npm install
-
Install Rust dependencies:
cd src-tauri cargo build cd ..
-
Run in development mode:
npm run tauri dev
# Build for current platform
npm run tauri build
# The built application will be in src-tauri/target/release/-
OpenAI API Key: Required for speech recognition and translation
- Go to API Settings page in the application
- Enter your OpenAI API key
- Configure API endpoint (default: https://api.openai.com/v1)
- Select STT model (e.g., whisper-large-v3)
- Select translation model (e.g., gpt-3.5-turbo)
-
Language Configuration:
- Set spoken language (auto-detect recommended)
- Set target translation language
- Configure quick-access languages
-
Audio Settings:
- Select microphone device
- Test audio levels
- Configure VAD parameters
-
Hotkeys:
- Hands-free recording: Default
Ctrl+Shift+Space
- Hands-free recording: Default
Fine-tune voice detection in the settings store:
vad: {
speechThreshold: 0.02, // Energy threshold for speech
silenceThreshold: 0.01, // Energy threshold for silence
maxChunkDurationMs: 5000, // Maximum chunk duration
silenceTimeoutMs: 500, // Silence timeout
overlapMs: 220, // Overlap to prevent word cutting
sampleRate: 16000 // Audio sample rate
}- Start the Application: Launch TalkToMe from desktop or start menu
- Configure API: Set up your OpenAI API key in API Settings
- Select Languages: Choose input and output languages
- Start Recording: Click the record button or use hotkey
- Speak Naturally: Talk into your microphone
- View Results: See transcription and translation in real-time
- Copy/Export: Use built-in tools to copy or export text
The application runs in the system tray with these options:
- Show Main Window
- Preferences
- API Settings
- Language Settings
- Audio Settings
- About
- Quit
- Hands-Free: Toggle recording on/off
The audio processing pipeline can be customized in audio.rs:
// VAD Configuration
VoiceActivityDetector {
speech_threshold: 0.02,
silence_threshold: 0.01,
min_speech_duration_ms: 350,
max_speech_duration_ms: 5000,
silence_timeout_ms: 500,
overlap_ms: 220,
// Signal processing
target_rms: 0.1,
max_gain: 8.0,
noise_gate: 0.005,
}TalkToMe supports OpenAI-compatible APIs:
- OpenAI (official)
- Azure OpenAI
- Local Whisper servers
- Custom implementations
-
No Audio Input:
- Check microphone permissions
- Select correct audio device in settings
- Test microphone in audio settings
-
API Errors:
- Verify API key is correct
- Check API endpoint URL
- Ensure sufficient API credits
-
Poor Recognition:
- Adjust VAD thresholds
- Check microphone quality
- Reduce background noise
-
Performance Issues:
- Enable debug logging
- Check system resources
- Adjust chunk duration
Enable debug logging in Preferences to troubleshoot issues:
- Log location:
%APPDATA%/TalkToMe/logs/ - Contains detailed pipeline information
- Includes WAV dumps for audio analysis
TalkToMe/
βββ src/ # Svelte frontend
β βββ lib/
β β βββ stores/
β β βββ settingsStore.ts # Settings management
β βββ routes/ # Page components
β β βββ +layout.svelte # Main layout
β β βββ +page.svelte # Home page
β β βββ preferences/ # Settings pages
β β βββ api-settings/
β β βββ language-settings/
β β βββ audio-settings/
β β βββ about/
β βββ app.html # HTML template
βββ src-tauri/ # Rust backend
β βββ src/
β β βββ lib.rs # Main application logic
β β βββ audio.rs # Audio capture & VAD
β β βββ stt.rs # Speech-to-text service
β β βββ translation.rs # Translation service
β β βββ text_insertion.rs # Text insertion utilities
β β βββ system_audio.rs # System audio controls
β β βββ settings.rs # Settings management
β β βββ debug_logger.rs # Debug logging system
β βββ Cargo.toml # Rust dependencies
β βββ tauri.conf.json # Tauri configuration
βββ static/ # Static assets
βββ package.json # Node.js dependencies
βββ README.md # This file
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes
- Add tests if applicable
- Commit:
git commit -am 'Add feature' - Push:
git push origin feature-name - Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Tauri - Cross-platform desktop framework
- Svelte - Frontend framework
- OpenAI Whisper - Speech recognition
- CPAL - Cross-platform audio
- TailwindCSS - Styling framework
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Project Wiki
TalkToMe - Bridging languages through voice technology π£οΈ β π β π