Confusion Regarding Label Generation Process #153

Asunatan · 2025-03-12T08:30:19Z

Hello, I'm a beginner in the field of VLM and I'm confused about label generation.
In the sft.py:

  labels = batch["input_ids"].clone()
  labels[labels == processor.tokenizer.pad_token_id] = -100
  image_token_id = processor.tokenizer.convert_tokens_to_ids(processor.image_token)
  labels[labels == image_token_id] = -100
  batch["labels"] = labels

This means that, except for pad_token_id and image_token, all other tokens will be used to calculate the loss, including the system prompt, <|im_start|>user\n, etc. I noticed that in LLaVA-CoT, labels are generated by searching for each token individually, and tokens like <|im_start|>assistant\n are masked. I'm curious whether only masking pad_token_id and image_token could lead to some unexpected errors?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion Regarding Label Generation Process #153

Confusion Regarding Label Generation Process #153

Asunatan commented Mar 12, 2025

Confusion Regarding Label Generation Process #153

Confusion Regarding Label Generation Process #153

Comments

Asunatan commented Mar 12, 2025