This project provides tools for generating dubbed audio from subtitles (SRT files) and merging the dubbed audio with the original video. It also includes a script for transcribing video content using OpenAI's Whisper model.
- Text-to-Speech (TTS) Dubbing: Generate dubbed audio from SRT files using a TTS model.
- Video Dubbing: Merge dubbed audio with the original video, applying auto-ducking to reduce the volume of the original audio during dubbed segments.
- Video Transcription: Transcribe video content into SRT format using OpenAI's Whisper model.
To install the necessary Python dependencies, run:
pip install -r requirements.txt
The requirements.txt
file includes the following dependencies:
torch>=1.10.0
torchaudio>=0.10.0
coqui-ai/TTS>=0.0.13
openai-whisper>=1.0.0
toml>=0.10.2
numpy>=1.21.0
soundfile>=0.12.1
srt>=3.5.0
cached-path>=0.2.0
omegaconf>=2.3.0
- FFmpeg: Required for the
mix_audio.sh
script. Install it using your system's package manager:- Ubuntu/Debian:
sudo apt-get install ffmpeg
- macOS:
brew install ffmpeg
- Windows: Download and install from the official FFmpeg website.
- Ubuntu/Debian:
Use the whisper.sh
script to transcribe a video into an SRT file:
./scripts/whisper.sh path/to/video.mp4
This will generate a transcript.srt
file in the current directory.
Use the srt2voice.py
script to generate dubbed audio from an SRT file:
python srt2voice.py -c config.toml
The config.toml
file specifies the TTS model, reference audio, and other settings. Here's an example configuration:
# F5-TTS | E2-TTS
model = "F5TTS_v1_Base"
ref_audio = "./assets/voice.flac"
ref_text = ""
gen_text = ""
gen_file = "transcript.srt"
remove_silence = false
output_dir = "dubbed"
speed = 1
Use the mix_audio.sh
script to merge the dubbed audio with the original video:
./scripts/mix_audio.sh path/to/video.mp4 path/to/dubbed_audio.wav
This script will:
- Extract the original audio from the video.
- Apply auto-ducking to the original audio.
- Merge the dubbed audio with the video.
- Save the final dubbed video in the
output_video
directory.
-
Transcribe the video:
./scripts/whisper.sh path/to/video.mp4
-
Generate the dubbed audio:
python srt2voice.py -c config.toml
-
Merge the dubbed audio with the video:
./scripts/mix_audio.sh path/to/video.mp4 path/to/dubbed_audio.wav
This will produce a final video file with the dubbed audio synchronized with the original video.
The config.toml
file allows you to customize the TTS model, reference audio, and other settings. Refer to the story-sample.toml
file for an example configuration.
If you'd like to contribute to this project, please fork the repository and submit a pull request. Ensure your changes are well-documented and tested.
This project is licensed under the MIT License. See the LICENSE file for details.