Skip to content

Conversation

@cifkao
Copy link

@cifkao cifkao commented Oct 30, 2025

Some datasets might have annotated regions that exceed the audio duration ever so slightly (e.g due to rounding). During training, this is not detected until we try to load audio from the out-of-bounds region, which will crash the training. With the proposed change, we can set audio_crop_mode="pad" to handle such cases gracefully.

@hbredin
Copy link
Member

hbredin commented Nov 3, 2025

Thanks for the PR. To avoid any weird behavior (e.g. chunks full of zeros), I'd rather find a solution that fixes the original problem (annotated regions after audio end time).

Therefore, I'm going to close this PR -- but I'd be happy to investigate the original problem if you can open an issue and share faulty audio files / annotations.

@hbredin hbredin closed this Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants