Skip to content

fix: auto-resample audio to 24kHz instead of asserting#233

Open
haosenwang1018 wants to merge 2 commits intoQwenLM:mainfrom
haosenwang1018:fix/auto-resample-audio
Open

fix: auto-resample audio to 24kHz instead of asserting#233
haosenwang1018 wants to merge 2 commits intoQwenLM:mainfrom
haosenwang1018:fix/auto-resample-audio

Conversation

@haosenwang1018
Copy link

Problem

extract_mels() in finetuning/dataset.py crashes with AssertionError: Only support 24kHz audio when training data uses a different sample rate (e.g. 16kHz, 44.1kHz, 48kHz). This fails deep into training with no early warning or helpful error message.

Related to #204 (bug 2: "No sample rate validation until runtime")

Fix

Replace the assert with automatic resampling via librosa.resample(), which is already imported and used elsewhere in the module. High-quality resampling is applied transparently, and the target rate (24000) is enforced before mel extraction.

Changes

  • finetuning/dataset.py: Replace assert sr == 24000 with automatic resample when sr != 24000

The speaker_encoder weights were explicitly deleted from the state dict
before saving checkpoints (lines 150-153). When resuming training from
a checkpoint, model.speaker_encoder becomes None, causing a crash on
the first forward pass.

Keep speaker_encoder in checkpoints so that training can resume
correctly. Users who want smaller inference-only models can strip
these weights separately.

Fixes QwenLM#204 (bug 1)
The extract_mels method crashes with an assertion error when audio is not
exactly 24kHz. Since librosa is already imported and used for loading,
automatically resample to 24kHz instead of failing.

This gives users a clear, working path without requiring manual audio
preprocessing. librosa.resample uses high-quality resampling by default.

Related to QwenLM#204 (bug 2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant