You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This means that, except for pad_token_id and image_token, all other tokens will be used to calculate the loss, including the system prompt, <|im_start|>user\n, etc. I noticed that in LLaVA-CoT, labels are generated by searching for each token individually, and tokens like <|im_start|>assistant\n are masked. I'm curious whether only masking pad_token_id and image_token could lead to some unexpected errors?
The text was updated successfully, but these errors were encountered:
Hello, I'm a beginner in the field of VLM and I'm confused about label generation.
In the sft.py:
This means that, except for pad_token_id and image_token, all other tokens will be used to calculate the loss, including the system prompt, <|im_start|>user\n, etc. I noticed that in LLaVA-CoT, labels are generated by searching for each token individually, and tokens like <|im_start|>assistant\n are masked. I'm curious whether only masking pad_token_id and image_token could lead to some unexpected errors?
The text was updated successfully, but these errors were encountered: