From field tape to rough cut in one place.
A local web app for audio journalism production. Upload a field interview, get a timestamped transcript with speaker detection, mark clips, cut them, add narration, and assemble a rough cut — all in the browser.
Built for radio journalists and podcast producers who work with field recordings and want to go from raw interview to rough cut without jumping between five different tools.
- Transcribe — Upload a WAV/MP3 interview. Whisper transcribes it with word-level timestamps. Supports Hebrew, English, Arabic, and more.
- Speaker detection (optional) — Automatically identifies who's talking and color-codes speakers throughout the UI.
- Mark clips — Click words in the transcript to mark clip boundaries. Clips are numbered automatically.
- Cut — One click cuts the source audio into separate WAV files using ffmpeg.
- Narration (optional) — Upload your narration recording, transcribe it, and mark narration clips the same way.
- Assemble — Drag interview clips and narration into order, hit assemble, get a single rough-cut WAV.
- Export — Download clips as a ZIP, transcript as a Word doc (with speaker colors), or the final rough cut.
- Waveform visualization with zoom, pan, and clip regions
- Speaker diarization with color-coded badges (rename or reassign speakers)
- Clip boundary trimming (fine-tune start/end times)
- Paper edit export (Word doc matching your assembly order)
- Multi-project support (save, load, duplicate projects)
- Export folder — auto-copy outputs to Google Drive, Dropbox, or any local folder
- Bilingual UI (English / Hebrew), easy to add more languages
- Built-in demo interview to try the full pipeline without your own audio
- First-run setup wizard for API keys
- Dark and light themes
- Works on macOS, Windows, and Linux
- Python 3.9+
- ffmpeg installed and in your PATH
- An OpenAI API key (for Whisper transcription)
- Optional: HuggingFace token (for speaker detection)
# Clone the repo
git clone https://github.com/shaulams/FieldCut.git
cd FieldCut
# Create a virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Optional: speaker detection (pulls ~2GB of PyTorch models)
pip install -r requirements-speaker.txt
# Run
python app.pyOpen http://localhost:5555 in your browser. On first run, the app will ask for your API keys.
Or try the built-in demo — click "Try a demo interview" to experience the full pipeline without uploading your own audio.
The only cost is OpenAI's Whisper API:
- $0.006 per minute of audio
- A 60-minute interview costs about $0.36
Speaker detection runs locally on your machine (free). Everything else is local too.
Speaker detection uses pyannote and requires a free HuggingFace account:
- Create a token at huggingface.co/settings/tokens (Read permission)
- Accept the terms for these models:
- Enter the token in the setup wizard, or add
HUGGINGFACE_TOKEN=hf_...to your.envfile
On Apple Silicon Macs, speaker detection uses the GPU automatically for faster processing.
- Backend: Python / Flask
- Frontend: Single HTML file, vanilla JS (no build step, no framework)
- Audio processing: ffmpeg
- Transcription: OpenAI Whisper API
- Speaker detection: pyannote.audio (runs locally)
- State: JSON file (no database needed)
Field Cut ships with English and Hebrew. Adding a new language takes ~10 minutes:
- Open
static/lang.js - Copy the
enblock and paste it as a new key (e.g.frfor French,arfor Arabic) - Translate every string value — keep the keys unchanged
- Set
_meta.nameto the language's own name (e.g."Francais") and_meta.dirto"ltr"or"rtl" - The language picker in the top bar will automatically include it
Contributions are welcome! Feel free to open issues or submit pull requests.