Skip to content

pitboss19/AudioScribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

README.md

AudioScribe

A simple command-line tool that transcribes audio files to Markdown and plain text using either the OpenAI Whisper API or a local Whisper model.

Features

  • Two transcription modes — cloud via OpenAI Whisper API, or fully offline with a local model
  • Automatic chunking — files over 25MB are split with ffmpeg and reassembled seamlessly
  • Dual output — generates both .txt (raw text) and .md (with metadata header) for every transcription
  • Broad format support.m4a, .mp3, .mp4, .wav, .ogg, .flac
  • .env support — reads OPENAI_API_KEY from a .env file in the script directory

Requirements

  • Python 3.10+
  • ffmpeg (required for chunking large files and for local Whisper)
# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

OpenAI API mode (default)

pip install openai

Local mode

pip install openai-whisper

Quick Start

# Clone the repo
git clone https://github.com/youruser/audioscribe.git
cd audioscribe

# (Optional) Create a .env file with your API key
echo "OPENAI_API_KEY=sk-..." > .env

# Transcribe with the OpenAI API
python transcribe.py recording.m4a

# Transcribe offline with a local model
python transcribe.py recording.m4a --local

# Use a larger local model for better accuracy
python transcribe.py recording.m4a --local --model medium

# Specify an output directory
python transcribe.py recording.m4a -o ./transcripts/

Usage

python transcribe.py <audio_file> [options]
Option Description
--local Use a local Whisper model instead of the OpenAI API
--model {tiny,base,small,medium,large} Local model size (default: base)
--output-dir, -o Output directory (default: same as input file)

Model Size Reference

When using --local, larger models are more accurate but slower and use more memory:

Model Parameters Relative Speed English Accuracy
tiny 39M ~32x Good
base 74M ~16x Better
small 244M ~6x Great
medium 769M ~2x Excellent
large 1550M 1x Best

Output

Each transcription produces two files alongside the source audio (or in the directory specified with -o):

recording.m4a
recording.txt          # Raw transcription text
recording.md           # Markdown with metadata header

The Markdown file includes a header with the source filename and timestamp:

# Transcription: recording

**Source:** `recording.m4a`
**Transcribed:** 2026-04-14 10:30:00

---

The transcribed text appears here...

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages