kyma-dub

Dub any video into another language with a natural, time-aligned AI voiceover — one command, one Kyma key.

kyma-dub composes ffmpeg + a Whisper-class ASR + an LLM + a natural text-to-speech voice into a single command that takes a video speaking one language and hands you back the same video speaking another. The new voice lands where the original speaker was — it tracks the picture instead of drifting — and it never puts words in their mouth: when the audio is unclear, it stays generic rather than inventing a name, place, or fact that was never said.

kyma-dub talk.mov --from vi --voice charlie
# -> talk [EN dub].mp4

Any source language Whisper can hear, any target language you ask for. Works on mp4, mov, mkv, webm — anything ffmpeg reads.

What you can make

One command in, a finished artifact out:

Run	Get back
`kyma-dub talk.mov`	the same video, now speaking English, synced to the original timing
`kyma-dub talk.mov --voice will`	a different voice — browse them with `kyma-dub voices`
`kyma-dub talk.mov --to es`	a Spanish dub
`kyma-dub talk.mov --bilingual --burn`	captions burned in: target on top, the original below
`kyma-dub subs talk.mov --to en`	a translated `.srt` / `.vtt` subtitle file
`kyma-dub recommend --for "young female, TikTok"`	a voice (and model) matched to your audience

How it works

video ─▶ extract audio (ffmpeg)
      ─▶ transcribe + word timing (Whisper)
      ─▶ group into chunks at natural speech pauses
      ─▶ translate each chunk, fitted to its seconds (LLM)
      ─▶ speak each chunk in a locked voice (TTS)
      ─▶ speed each clip ONLY up to fit its slot, reassemble on the timeline
      ─▶ mux the new audio back over the original video

Two ideas keep it in sync and natural:

Per-chunk timestamp anchoring — it never translates one long block and reads it straight (that drifts further off with every sentence). Each chunk is voiced and placed back at its original timestamp.
Speed-up-only isochrony — a chunk is only ever sped up to fit its slot (capped by --max-speed), never slowed, because slowing drags the audio and feels delayed. Leftover time becomes a natural pause. The translation length is budgeted per chunk from a per-language characters-per-second model, so the timing holds across dense scripts (CJK/Thai) and verbose ones (Latin) alike.

Why this exists

Dubbing a video usually means a subscription product, an upload, and trusting a black box with your footage — and the polished ones still hallucinate, smoothing a mis-heard word into a confident wrong fact (turning a plain "university" into "Stanford," or a vague location into a city that was never said). For a clip about you, that's not a rough edge; it's a lie in your voice.

kyma-dub is one local command instead. Faithfulness is the first rule — unclear audio stays generic, it never guesses a proper noun, and it translates only what's there. It runs the whole pipeline on a single Kyma key (transcribe, translate, and voice), so there's no provider juggling. And it's open source, so you can read exactly what it does to your words before you publish them. The voice and the captions come out of the same command — not an editor.

Install

curl -fsSL https://github.com/sonpiaz/kyma-dub/releases/latest/download/install.sh | bash

Then set one key:

# ~/.config/kyma-dub/env  (created by the installer)
KYMA_API_KEY=kyma-xxxxxxxx        # transcribe + translate + voice
# ELEVENLABS_API_KEY=xi-xxxxxxxx  # optional: direct voice path / deep fallback

Get a Kyma key at kymaapi.com — 60 seconds, no card, free credit at signup.

Dependencies: ffmpeg, ffprobe, curl, python3 (brew install ffmpeg).

Dub

kyma-dub <video> [options]

  --from <lang>            source language code (default: auto-detect)
  --to <lang>              target language code (default: en)
  --voice <name|id>        charlie | will | liam | brian | rachel | adam | jessica
                           or any voice id from `kyma-dub voices` (default: charlie)
  --tts <engine>           kyma (default — one key) | elevenlabs (direct)
  --model <id>             translation model (default: qwen-3.7-max)
  --max-speed <n>          max voice speed-up to fit a slot (default: 1.5)
  --chunk-sec <n>          max seconds per dub chunk (default: 22)
  --allow-voice-fallback   permit an independent voice if every primary path is down
  --srt                    also write a .srt timed to the dubbed audio
  --bilingual              bilingual captions: target on top, cleaned source below
  --burn                   burn the captions into the output video
  --out <path>             output file
  --keep-temp              keep intermediate files
  --version / -h

Subtitles & bilingual captions

Generate translated subtitles without changing the audio, or burn bilingual captions into the video:

kyma-dub subs talk.mov --to en --format both   # talk.en.srt + talk.en.vtt
kyma-dub talk.mov --srt                        # dub + a .srt timed to the dubbed audio
kyma-dub talk.mov --bilingual --burn           # dub + burned captions: target on top, original below
kyma-dub subs talk.mov --bilingual --burn      # bilingual captions on a non-dubbed video

Bilingual captions put the target language on top and a tidied-up version of the original below (smaller and dimmer), one line each, inside a centred band that clears a corner webcam. The original line is cleaned by AI (recognition junk removed, light punctuation) but never changed in meaning.

Burning needs a libass-enabled ffmpeg. Homebrew's ffmpeg ships without it, so run kyma-dub setup-ffmpeg once — it drops a static libass ffmpeg into ~/.kyma-dub/bin/ and never touches your system ffmpeg. Without it, --burn falls back to writing the subtitle file, so nothing breaks.

Discover

Models and voices on Kyma change over time, so kyma-dub reads them live — you (or an agent driving it) always pick from what's current and matched to the audience:

kyma-dub models                                  # translation models on Kyma now
kyma-dub voices --gender female --age young --use-case social_media
kyma-dub voices --library --lang es              # search the shared voice library
kyma-dub preview <voice|id> ["sample text"]      # hear a voice before committing
kyma-dub recommend --for "energetic, for TikTok" --smart
kyma-dub whatsnew                                # what Kyma added since last check

Filter voices by gender, age, accent, use-case, descriptive, language. --smart lets a model pick from a free-text description.

Routing & fallback

By default the whole pipeline runs on one Kyma key. The voice engine is locked once at job start so it never changes mid-video, with a fallback chain underneath: if the primary voice path is unavailable, it falls back through equivalent paths that preserve the voice, and only swaps to an independent voice as a last resort — and only when you opt in with --allow-voice-fallback, because a tool should never silently change how you sound. Pass --tts elevenlabs to voice straight from ElevenLabs (one less hop; needs ELEVENLABS_API_KEY).

Desktop (MCP)

Drive kyma-dub from Claude desktop, Cursor, or ChatGPT desktop via the published MCP server @sonpiaz/kyma-dub-mcp — job-based dubbing plus live list_models / list_voices / recommend_voice / preview_voice / whatsnew tools and kyma-dub://models + kyma-dub://voices resources.

{
  "mcpServers": {
    "kyma-dub": {
      "command": "npx",
      "args": ["-y", "@sonpiaz/kyma-dub-mcp"],
      "env": {
        "KYMA_API_KEY": "kyma-xxxxxxxx",
        "KYMA_DUB_BIN": "/Users/you/.local/bin/kyma-dub"
      }
    }
  }
}

The MCP server shells out to the kyma-dub CLI, so install the CLI first. Desktop apps spawn with a minimal PATH, so set KYMA_DUB_BIN to the absolute CLI path (which kyma-dub). See mcp-server/README.md.

Notes & limits

Best for narration, presentation, and b-roll. It replaces the audio — there's no lip-sync, so for tight talking-head shots the mouth won't match the new language.
Background music is replaced along with the speech (source-separation to keep the music bed isn't in yet).

License

MIT © Son Piaz

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
bin		bin
examples		examples
lib		lib
mcp-server		mcp-server
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kyma-dub

What you can make

How it works

Why this exists

Install

Dub

Subtitles & bilingual captions

Discover

Routing & fallback

Desktop (MCP)

Notes & limits

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

kyma-dub

What you can make

How it works

Why this exists

Install

Dub

Subtitles & bilingual captions

Discover

Routing & fallback

Desktop (MCP)

Notes & limits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages