Skip to content

eddieoz/srt2voice

Repository files navigation

Video Dubbing and Transcription Tool

This project provides tools for generating dubbed audio from subtitles (SRT files) and merging the dubbed audio with the original video. It also includes a script for transcribing video content using OpenAI's Whisper model.

Features

  • Text-to-Speech (TTS) Dubbing: Generate dubbed audio from SRT files using a TTS model.
  • Video Dubbing: Merge dubbed audio with the original video, applying auto-ducking to reduce the volume of the original audio during dubbed segments.
  • Video Transcription: Transcribe video content into SRT format using OpenAI's Whisper model.

Installation and Requirements

Dependencies

To install the necessary Python dependencies, run:

pip install -r requirements.txt

The requirements.txt file includes the following dependencies:

  • torch>=1.10.0
  • torchaudio>=0.10.0
  • coqui-ai/TTS>=0.0.13
  • openai-whisper>=1.0.0
  • toml>=0.10.2
  • numpy>=1.21.0
  • soundfile>=0.12.1
  • srt>=3.5.0
  • cached-path>=0.2.0
  • omegaconf>=2.3.0

System Requirements

  • FFmpeg: Required for the mix_audio.sh script. Install it using your system's package manager:
    • Ubuntu/Debian:
      sudo apt-get install ffmpeg
    • macOS:
      brew install ffmpeg
    • Windows: Download and install from the official FFmpeg website.

Usage

1. Transcribe Video Content

Use the whisper.sh script to transcribe a video into an SRT file:

./scripts/whisper.sh path/to/video.mp4

This will generate a transcript.srt file in the current directory.

2. Generate Dubbed Audio

Use the srt2voice.py script to generate dubbed audio from an SRT file:

python srt2voice.py -c config.toml

The config.toml file specifies the TTS model, reference audio, and other settings. Here's an example configuration:

# F5-TTS | E2-TTS
model = "F5TTS_v1_Base"
ref_audio = "./assets/voice.flac"
ref_text = ""
gen_text = ""
gen_file = "transcript.srt"
remove_silence = false
output_dir = "dubbed"
speed = 1

3. Merge Dubbed Audio with Video

Use the mix_audio.sh script to merge the dubbed audio with the original video:

./scripts/mix_audio.sh path/to/video.mp4 path/to/dubbed_audio.wav

This script will:

  • Extract the original audio from the video.
  • Apply auto-ducking to the original audio.
  • Merge the dubbed audio with the video.
  • Save the final dubbed video in the output_video directory.

Example Workflow

  1. Transcribe the video:

    ./scripts/whisper.sh path/to/video.mp4
  2. Generate the dubbed audio:

    python srt2voice.py -c config.toml
  3. Merge the dubbed audio with the video:

    ./scripts/mix_audio.sh path/to/video.mp4 path/to/dubbed_audio.wav

This will produce a final video file with the dubbed audio synchronized with the original video.

Configuration

The config.toml file allows you to customize the TTS model, reference audio, and other settings. Refer to the story-sample.toml file for an example configuration.

Contributing

If you'd like to contribute to this project, please fork the repository and submit a pull request. Ensure your changes are well-documented and tested.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published