Skip to content

Voice conversion: voice_convert()#11

Merged
TroyHernandez merged 3 commits into
mainfrom
voice-conversion
Jun 12, 2026
Merged

Voice conversion: voice_convert()#11
TroyHernandez merged 3 commits into
mainfrom
voice-conversion

Conversation

@TroyHernandez

Copy link
Copy Markdown
Contributor

Ports Python chatterbox's ChatterboxVC (vc.py) — the speech-to-speech path. Source speech goes through the S3 tokenizer at full length (preserving the source's timing) and S3Gen re-renders the tokens with the target voice's conditioning. No T3, no text.

res <- voice_convert(model, "someone_talking.wav", "target_voice.wav")
write_audio(res$audio, res$sample_rate, "converted.wav")
  • Target voice accepts a voice_embedding (incl. ones from load_voice_embedding()) or a reference audio path.
  • Turbo models are rejected with a clear error (VC uses the standard CFM decoder, matching Python).
  • Validation vs the 0.1.7 container, same source/target: durations match to 0.01 s (7.56 s both), amplitude in family (std 0.048 R / 0.051 Py; CFM noise draws differ by construction). A/B wavs in ~/Sync: vc_jfk_to_reference.wav (R) vs vc_py_jfk_to_reference.wav (Python).
  • Guarded error-path tests in test_vc.R; full suite passes.
  • Version 0.1.0.7 + NEWS as separate bump commit.

Source speech -> S3 tokenizer (full length, keeps source timing) ->
S3Gen with the target voice's ref_dict. Validated against the 0.1.7
container on the same source/target: durations match to 0.01 s
(7.56 s), amplitude in family (std 0.048 R vs 0.051 Python; CFM noise
draws differ by construction).
@TroyHernandez TroyHernandez merged commit 086ba21 into main Jun 12, 2026
4 checks passed
@TroyHernandez TroyHernandez deleted the voice-conversion branch June 12, 2026 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant