Solves #2598 #2599

nouranali · 2025-04-15T19:03:20Z

No description provided.

Co-authored-by: Joe Cummings <[email protected]>

pytorch-bot · 2025-04-15T19:03:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2599

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ebsmothers

Thanks for the PR! Before giving a more detailed review, one high-level comment: I wonder if we really want to handle this for people in general. E.g. with pad_logits it is possible for me to run KD on two totally different tokenizers (and vocab spaces) without seeing any errors, right? This goes beyond just the case of "vocab of teacher model is subset of vocab of student model" (or vice versa).

Anyways I agree that mismatched vocab is an challenge (especially for Qwen models) -- it's come up in a couple issues on the repo before. But I also want to make sure we don't do something behind the scenes that allows someone to accidentally shoot themselves in the foot. Would be interested to hear your thoughts on this though.

nouranali · 2025-04-16T07:50:44Z

Thanks for the PR! Before giving a more detailed review, one high-level comment: I wonder if we really want to handle this for people in general. E.g. with pad_logits it is possible for me to run KD on two totally different tokenizers (and vocab spaces) without seeing any errors, right? This goes beyond just the case of "vocab of teacher model is subset of vocab of student model" (or vice versa).

Anyways I agree that mismatched vocab is an challenge (especially for Qwen models) -- it's come up in a couple issues on the repo before. But I also want to make sure we don't do something behind the scenes that allows someone to accidentally shoot themselves in the foot. Would be interested to hear your thoughts on this though.

When I run KD for Qwen models that don't have the same tokenizer I get size mismatch exception, Qwen14B as teacher and Qwen3B as student, that's why I added the padding solution to my local code and after #2589 is accepted other people who run the recipe will face the same error.
I also agree with you that it's not necessary for this exception to happen but in my case, using a chat_dataset in openai format it does happen as not all entries have the same size

nouranali and others added 10 commits April 14, 2025 11:33

add kd recipe for qwen models

daf2a20

Update recipes/configs/qwen2_5/14B_to_7B_KD_lora_single_device.yaml

bb241f3

Co-authored-by: Joe Cummings <[email protected]>

Update recipes/configs/qwen2_5/14B_to_7B_KD_lora_single_device.yaml

3c9fb79

Co-authored-by: Joe Cummings <[email protected]>

Update recipes/configs/qwen2_5/14B_to_7B_KD_lora_single_device.yaml

f15558a

Co-authored-by: Joe Cummings <[email protected]>

Update recipes/configs/qwen2_5/14B_to_7B_KD_lora_single_device.yaml

ba4b870

Co-authored-by: Joe Cummings <[email protected]>

Merge branch 'pytorch:main' into main

819fa9f

edit file names

d064dff

Update recipes/configs/qwen2_5/14B_to_7B_KD_lora_single_device.yaml

dac943a

Co-authored-by: Joe Cummings <[email protected]>

Update recipes/configs/qwen2_5/14B_to_7B_KD_lora_single_device.yaml

a71678b

Co-authored-by: Joe Cummings <[email protected]>

add padding logits to handle logits mismatch in kd

aa9cd0a

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 15, 2025

ebsmothers reviewed Apr 16, 2025

View reviewed changes

ebsmothers mentioned this pull request Apr 16, 2025

Qwen Student and teacher logits mismatch during KD, needs padding #2598

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solves #2598 #2599

Solves #2598 #2599

nouranali commented Apr 15, 2025

pytorch-bot bot commented Apr 15, 2025

ebsmothers left a comment

nouranali commented Apr 16, 2025 •

edited

Loading

Solves #2598 #2599

Are you sure you want to change the base?

Solves #2598 #2599

Conversation

nouranali commented Apr 15, 2025

pytorch-bot bot commented Apr 15, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2599

ebsmothers left a comment

Choose a reason for hiding this comment

nouranali commented Apr 16, 2025 • edited Loading

nouranali commented Apr 16, 2025 •

edited

Loading