A simple command-line tool that transcribes audio files to Markdown and plain text using either the OpenAI Whisper API or a local Whisper model.
- Two transcription modes — cloud via OpenAI Whisper API, or fully offline with a local model
- Automatic chunking — files over 25MB are split with ffmpeg and reassembled seamlessly
- Dual output — generates both
.txt(raw text) and.md(with metadata header) for every transcription - Broad format support —
.m4a,.mp3,.mp4,.wav,.ogg,.flac .envsupport — readsOPENAI_API_KEYfrom a.envfile in the script directory
- Python 3.10+
- ffmpeg (required for chunking large files and for local Whisper)
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt install ffmpegpip install openaipip install openai-whisper# Clone the repo
git clone https://github.com/youruser/audioscribe.git
cd audioscribe
# (Optional) Create a .env file with your API key
echo "OPENAI_API_KEY=sk-..." > .env
# Transcribe with the OpenAI API
python transcribe.py recording.m4a
# Transcribe offline with a local model
python transcribe.py recording.m4a --local
# Use a larger local model for better accuracy
python transcribe.py recording.m4a --local --model medium
# Specify an output directory
python transcribe.py recording.m4a -o ./transcripts/python transcribe.py <audio_file> [options]
| Option | Description |
|---|---|
--local |
Use a local Whisper model instead of the OpenAI API |
--model {tiny,base,small,medium,large} |
Local model size (default: base) |
--output-dir, -o |
Output directory (default: same as input file) |
When using --local, larger models are more accurate but slower and use more memory:
| Model | Parameters | Relative Speed | English Accuracy |
|---|---|---|---|
tiny |
39M | ~32x | Good |
base |
74M | ~16x | Better |
small |
244M | ~6x | Great |
medium |
769M | ~2x | Excellent |
large |
1550M | 1x | Best |
Each transcription produces two files alongside the source audio (or in the directory specified with -o):
recording.m4a
recording.txt # Raw transcription text
recording.md # Markdown with metadata header
The Markdown file includes a header with the source filename and timestamp:
# Transcription: recording
**Source:** `recording.m4a`
**Transcribed:** 2026-04-14 10:30:00
---
The transcribed text appears here...MIT