-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
phi3.5 genai converted model output garbage results with input length around 3000 and 8000. #954
Comments
Thanks for reporting this. Does it matter which prompt you use? Or any long prompt is producing this output? |
It seems only related to length of the prompt, i got several ones around 3000 length have this issue, like 3824, 3613, 3918... And i have also some samples are correct with 4000 and 5000 length. |
Thank you. Can you share the prompts that produce garbage? The 3000 length and the 8000 length, so that we can repro. |
Sorry, i cant provide the prompt, because its customer data. |
No problem. I did reproduce garbage output for a prompt length of 3402. We are investigating. |
We are investigating a fix for this issue |
any updates ? Thanks |
sorry haven't made much progress on this till now, will prioritize it this week. Thanks. |
Hi @ajindal1 , |
Hi @ajindal1 , |
Hi @yufang67, |
Adding this fix into GenAI huggingface/transformers#33129, it should resolve the issue. |
Great, thanks. How can i use this latest fix ? Or there will have a new release soon. |
Hi @ajindal1 , |
Sorry the fix is not yet available, we are working on the fix and will be part of next release (0.6.0) or you can use the main branch (build from source) for the fix once it is added. I will update once it is complete. |
Hi @ajindal1, |
@yufang67 We have a working solution for this by using RewindTo feature in GenAI. Essentially, the user will need to rewind to an earlier state and add the new tokens generated. Since at this point the model switches from short factor to long factor, it will take some time to generate the output as it needs to do some re-computations. Here is the example fix which works perfectly in Python. Let me know if you still face any issue on this:
|
Couple of more things on the above comment, first this would only work with the main branch as the RewindTo feature was not part of 0.5 release. Second, we are also working on the fix inside our repo so that the user doesn't have to handle this scenario in their code. |
Hi @ajindal1 , |
@yufang67 we have merged the above PR, can you please check if it resolves your issue. The fix will be part of 0.6.0 release which is scheduled in the next few weeks. |
Thanks @ajindal1 for the update. I tried to compile but i got issue with gcc compiler version.(i follow this doc https://onnxruntime.ai/docs/genai/howto/build-from-source.html) Is there a based image i can use ? Thanks |
@yufang67 We don't have an image which contains the package, but I am using this image |
Hi @ajindal1 , Is there any changes about this usage ? |
@yufang67 if you are building from source and using the latest version of example, you shouldn't be seeing this error. Can you share more details on how can I replicate this error on my end? |
@ajindal1 could you double check the wheel from latest main, if GeneratorParams has attribute input_ids ?
I got: for 0.6.0.dev0 Thanks |
@yufang67 We did have an API change and it was also mentioned here, there are two ways to fix this:
|
Describe the bug
Currently, i use onnxgenai==0.4.0 converted phi_3_5_mini_instruct (fp16 and cuda) and run the infer with onnxgenai on A100 80G.
I observed for some input length around 3000 (8000), i got result length up to the fixed max_length and the results are full of "\n" .
for example, i fixed the max_length is 12K, if the input is 3424 and the output gives 8576 and the output is filled with followings:
n0.\n.\n0.\n.\n.\n0.\n\n\n\n\n\n2.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n.\n.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n.\n2.\n\n\n\n\n2.\n2.\n\n\n.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n.\n\n\n\n\n\n\n.\n\n\n\n\n\n\n\n\n\n\n.\n\n\n\n\n\n.\n0.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n2.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n0.
I compared with the transformer API, i didn't get this kind of results with same model.
Any clue of this issue ? (I have seen for vLLM/ transfomers, there exists an issue, https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/discussions/85, vllm-project/vllm#8254 )
Thanks
The text was updated successfully, but these errors were encountered: