Skip to content

Conversation

XuesongYang
Copy link
Collaborator

No description provided.

…or riva speakers and jhh.

Signed-off-by: Xuesong Yang <[email protected]>
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a script for creating text context manifests from existing Lhotse audio manifests for three specific speaker datasets (Riva and JHH). The script extracts speaker and emotion information from segment IDs and reformats it as text context for TTS training purposes.

  • Adds functionality to process Lhotse cut manifests and replace audio context with text context
  • Implements dataset-specific logic for extracting speaker suffixes from segment IDs
  • Includes shard verification and validation to ensure data integrity

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Signed-off-by: Xuesong Yang <[email protected]>
@blisc
Copy link
Collaborator

blisc commented Sep 5, 2025

Can you docstring this file and its functions so we understand what it does?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants