You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
would give out the following error for the final process_logits call:
ValueError: All stacks are empty, so the only token accepted is EOS(2) but got 29898
This is because the Llama tokenizer adds a dummy white space at the start of the sequence. Therefore, when encoding answer, the tokenizer would give token_id 1234, instead of 12011
When parsing the first empty prefix, the next allowed token won't contain [1234], but only token_ids correspond to tokens without the prefix space, which will make the parsing of the second prefix illegal, thus only accepting EOS token for the next token, so the third prefix would raise an error.
It would be very helpful if we could deal with this prefix whitespace problem.
The text was updated successfully, but these errors were encountered:
`
`
would give out the following error for the final
process_logits
call:This is because the Llama tokenizer adds a dummy white space at the start of the sequence. Therefore, when encoding answer, the tokenizer would give token_id 1234, instead of 12011
`
`
When parsing the first empty prefix, the next allowed token won't contain [1234], but only token_ids correspond to tokens without the prefix space, which will make the parsing of the second prefix illegal, thus only accepting EOS token for the next token, so the third prefix would raise an error.
It would be very helpful if we could deal with this prefix whitespace problem.
The text was updated successfully, but these errors were encountered: