Add `audio_crop_mode` parameter to speaker diarization #1944

cifkao · 2025-10-30T15:23:39Z

Some datasets might have annotated regions that exceed the audio duration ever so slightly (e.g due to rounding). During training, this is not detected until we try to load audio from the out-of-bounds region, which will crash the training. With the proposed change, we can set audio_crop_mode="pad" to handle such cases gracefully.

hbredin · 2025-11-03T21:27:47Z

Thanks for the PR. To avoid any weird behavior (e.g. chunks full of zeros), I'd rather find a solution that fixes the original problem (annotated regions after audio end time).

Therefore, I'm going to close this PR -- but I'd be happy to investigate the original problem if you can open an issue and share faulty audio files / annotations.

Add audio_crop_mode parameter to speaker diarization

134dc52

hbredin closed this Nov 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `audio_crop_mode` parameter to speaker diarization #1944

Add `audio_crop_mode` parameter to speaker diarization #1944

cifkao commented Oct 30, 2025

Uh oh!

hbredin commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add audio_crop_mode parameter to speaker diarization #1944

Add audio_crop_mode parameter to speaker diarization #1944

Conversation

cifkao commented Oct 30, 2025

Uh oh!

hbredin commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add `audio_crop_mode` parameter to speaker diarization #1944

Add `audio_crop_mode` parameter to speaker diarization #1944