[magpietts][lhotse] add a script for creating text context manifest for riva speakers and jhh. #14649

XuesongYang · 2025-09-04T16:24:13Z

No description provided.

…or riva speakers and jhh. Signed-off-by: Xuesong Yang <[email protected]>

Copilot

Pull Request Overview

This PR introduces a script for creating text context manifests from existing Lhotse audio manifests for three specific speaker datasets (Riva and JHH). The script extracts speaker and emotion information from segment IDs and reformats it as text context for TTS training purposes.

Adds functionality to process Lhotse cut manifests and replace audio context with text context
Implements dataset-specific logic for extracting speaker suffixes from segment IDs
Includes shard verification and validation to ensure data integrity

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

scripts/magpietts/create_text_context_lhotse_manifest.py

Signed-off-by: Xuesong Yang <[email protected]>

blisc · 2025-09-05T15:33:08Z

Can you docstring this file and its functions so we understand what it does?

[magpietts][lhotse] add a script for creating text context manifest f…

e38ad95

…or riva speakers and jhh. Signed-off-by: Xuesong Yang <[email protected]>

XuesongYang requested review from Copilot, subhankar-ghosh, blisc and paarthneekhara September 4, 2025 16:24

Copilot AI reviewed Sep 4, 2025

View reviewed changes

scripts/magpietts/create_text_context_lhotse_manifest.py Show resolved Hide resolved

scripts/magpietts/create_text_context_lhotse_manifest.py Show resolved Hide resolved

scripts/magpietts/create_text_context_lhotse_manifest.py Show resolved Hide resolved

added copyright header

adafb8f

Signed-off-by: Xuesong Yang <[email protected]>

XuesongYang added TTS Run CICD labels Sep 4, 2025

XuesongYang enabled auto-merge (squash) September 4, 2025 17:43

Merge branch 'magpietts_2508' into xueyang/magpietts_2508

11ef327

github-actions bot removed the TTS label Sep 6, 2025

chtruong814 added Run CICD and removed Run CICD labels Sep 6, 2025

chtruong814 temporarily deployed to test September 6, 2025 18:54 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[magpietts][lhotse] add a script for creating text context manifest for riva speakers and jhh. #14649

[magpietts][lhotse] add a script for creating text context manifest for riva speakers and jhh. #14649

Uh oh!

XuesongYang commented Sep 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blisc commented Sep 5, 2025

Uh oh!

Uh oh!

[magpietts][lhotse] add a script for creating text context manifest for riva speakers and jhh. #14649

Are you sure you want to change the base?

[magpietts][lhotse] add a script for creating text context manifest for riva speakers and jhh. #14649

Uh oh!

Conversation

XuesongYang commented Sep 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blisc commented Sep 5, 2025

Uh oh!

Uh oh!