Skip to content

fix: preserve speaker_encoder in checkpoints to allow training resume#232

Open
haosenwang1018 wants to merge 1 commit intoQwenLM:mainfrom
haosenwang1018:fix/keep-speaker-encoder-in-checkpoint
Open

fix: preserve speaker_encoder in checkpoints to allow training resume#232
haosenwang1018 wants to merge 1 commit intoQwenLM:mainfrom
haosenwang1018:fix/keep-speaker-encoder-in-checkpoint

Conversation

@haosenwang1018
Copy link

Problem

In finetuning/sft_12hz.py, the speaker_encoder weights are explicitly deleted from the state dict before saving checkpoints (lines 150-153). When training is resumed from one of these checkpoints, model.speaker_encoder is None, causing a crash on the first forward pass.

Related to #204 (bug 1: "Speaker encoder is removed from checkpoints – breaks resume")

Root Cause

drop_prefix = "speaker_encoder"
keys_to_drop = [k for k in state_dict.keys() if k.startswith(drop_prefix)]
for k in keys_to_drop:
    del state_dict[k]

Fix

Remove the speaker_encoder deletion. Checkpoints now include all model weights, allowing training to resume correctly. Users who need smaller inference-only exports can strip the speaker_encoder weights in a separate post-processing step.

Impact

  • Checkpoint size: Increases by the size of the speaker_encoder weights (~small relative to the full model)
  • Resume: Now works without crashes
  • Backward compatible: Existing code that loads checkpoints without speaker_encoder will still work (missing keys are handled by the model's load logic)

The speaker_encoder weights were explicitly deleted from the state dict
before saving checkpoints (lines 150-153). When resuming training from
a checkpoint, model.speaker_encoder becomes None, causing a crash on
the first forward pass.

Keep speaker_encoder in checkpoints so that training can resume
correctly. Users who want smaller inference-only models can strip
these weights separately.

Fixes QwenLM#204 (bug 1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant