phi3.5 genai converted model output garbage results with input length around 3000 and 8000. #954

yufang67 · 2024-10-03T19:20:59Z

Describe the bug
Currently, i use onnxgenai==0.4.0 converted phi_3_5_mini_instruct (fp16 and cuda) and run the infer with onnxgenai on A100 80G.
I observed for some input length around 3000 (8000), i got result length up to the fixed max_length and the results are full of "\n" .

for example, i fixed the max_length is 12K, if the input is 3424 and the output gives 8576 and the output is filled with followings:

n0.\n.\n0.\n.\n.\n0.\n\n\n\n\n\n2.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n.\n.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n.\n2.\n\n\n\n\n2.\n2.\n\n\n.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n.\n\n\n\n\n\n\n.\n\n\n\n\n\n\n\n\n\n\n.\n\n\n\n\n\n.\n0.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n2.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n0.

I compared with the transformer API, i didn't get this kind of results with same model.

Any clue of this issue ? (I have seen for vLLM/ transfomers, there exists an issue, https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/discussions/85, vllm-project/vllm#8254 )
Thanks

natke · 2024-10-03T22:08:58Z

Thanks for reporting this. Does it matter which prompt you use? Or any long prompt is producing this output?

yufang67 · 2024-10-04T02:14:52Z

It seems only related to length of the prompt, i got several ones around 3000 length have this issue, like 3824, 3613, 3918... And i have also some samples are correct with 4000 and 5000 length.
Not sure if its related to the issue (link above) where it happens the input length <4096 but input + output >4096. But as in onnx we use max_length, we don't have control of the output length.

natke · 2024-10-04T19:10:24Z

Thank you. Can you share the prompts that produce garbage? The 3000 length and the 8000 length, so that we can repro.

yufang67 · 2024-10-04T20:25:12Z

Sorry, i cant provide the prompt, because its customer data.

natke · 2024-10-07T04:43:45Z

No problem. I did reproduce garbage output for a prompt length of 3402. We are investigating.

natke · 2024-10-15T17:03:17Z

We are investigating a fix for this issue

yufang67 · 2024-10-21T13:20:28Z

any updates ? Thanks

ajindal1 · 2024-10-21T17:50:00Z

sorry haven't made much progress on this till now, will prioritize it this week. Thanks.

yufang67 · 2024-10-24T15:47:54Z

Hi @ajindal1 ,
any clue ?
We found similar issue on Biomistral model with fp16 onnx conversion. The input length is around 6K, 7K.
Not sure if its the same issue

yufang67 · 2024-11-04T18:20:18Z

Hi @ajindal1 ,
any update ?
Thanks

ajindal1 · 2024-11-07T18:01:19Z

Hi @yufang67,
Apologies for the late response as I was traveling, looking into this today.

ajindal1 · 2024-11-15T18:57:50Z

Adding this fix into GenAI huggingface/transformers#33129, it should resolve the issue.

yufang67 · 2024-11-18T14:43:10Z

Great, thanks. How can i use this latest fix ? Or there will have a new release soon.
Thanks

yufang67 · 2024-11-22T19:45:46Z

Hi @ajindal1 ,
I still get this issue with onnxruntime_genai 0.5.1. could you confirm that the fix is included in 0.5.1 version ?
Thanks

ajindal1 · 2024-11-22T20:20:11Z

Sorry the fix is not yet available, we are working on the fix and will be part of next release (0.6.0) or you can use the main branch (build from source) for the fix once it is added. I will update once it is complete.

yufang67 · 2024-12-09T14:18:51Z

Hi @ajindal1,
is there anything i can test on ?
Thanks

ajindal1 · 2024-12-16T22:34:17Z

@yufang67 We have a working solution for this by using RewindTo feature in GenAI. Essentially, the user will need to rewind to an earlier state and add the new tokens generated. Since at this point the model switches from short factor to long factor, it will take some time to generate the output as it needs to do some re-computations. Here is the example fix which works perfectly in Python. Let me know if you still face any issue on this:

# Needs to go inside the generator.is_done loop
if len(new_tokens) + len(input_tokens) == 4097:
  generator.rewind_to(0)
  generator.append_tokens(input_tokens + new_tokens)

ajindal1 · 2024-12-16T22:51:29Z

Couple of more things on the above comment, first this would only work with the main branch as the RewindTo feature was not part of 0.5 release. Second, we are also working on the fix inside our repo so that the user doesn't have to handle this scenario in their code.

yufang67 · 2024-12-31T15:22:18Z

Hi @ajindal1 ,
thanks for the response. Do you have a branch or a wheel i can try ?
Thanks

ajindal1 · 2024-12-31T21:43:36Z

@yufang67 you can look at this PR, it is WIP but works on the Phi3 model: #1161

ajindal1 · 2025-01-08T21:57:40Z

@yufang67 we have merged the above PR, can you please check if it resolves your issue. The fix will be part of 0.6.0 release which is scheduled in the next few weeks.

yufang67 · 2025-01-09T15:38:48Z

Thanks @ajindal1 for the update. I tried to compile but i got issue with gcc compiler version.(i follow this doc https://onnxruntime.ai/docs/genai/howto/build-from-source.html) Is there a based image i can use ? Thanks

ajindal1 · 2025-01-09T19:03:05Z

@yufang67 We don't have an image which contains the package, but I am using this image nvcr.io/nvidia/pytorch:24.11-py3 and build command is working fine here. Let me know if you face any issues there.

yufang67 · 2025-01-16T20:30:14Z

Hi @ajindal1 ,
i encountered an error:
AttributeError: 'onnxruntime_genai.onnxruntime_genai.GeneratorParams' object has no attribute 'input_ids'
I use
generator_params.input_ids = input_ids
model.generate(generator_params)

Is there any changes about this usage ?

ajindal1 · 2025-01-16T21:38:43Z

@yufang67 if you are building from source and using the latest version of example, you shouldn't be seeing this error. Can you share more details on how can I replicate this error on my end?

yufang67 · 2025-01-16T21:48:43Z

@ajindal1 could you double check the wheel from latest main, if GeneratorParams has attribute input_ids ?

from onnxruntime_genai import GeneratorParams
GeneratorParams.__dict__

I got:
for 0.4.0
mappingproxy({'init': <instancemethod init at 0x7f61fa6fbe50>, 'doc': None, 'module': 'onnxruntime_genai.onnxruntime_genai', 'pad_token_id': <property object at 0x7f61fa534090>, 'eos_token_id': <property object at 0x7f61fa534130>, 'vocab_size': <property object at 0x7f61fa5341d0>, 'input_ids': <property object at 0x7f61fa5342c0>, 'whisper_input_features': <property object at 0x7f61fa5343b0>, 'set_inputs': <instancemethod set_inputs at 0x7f61fa538130>, 'set_model_input': <instancemethod set_model_input at 0x7f61fa538190>, 'set_search_options': <instancemethod set_search_options at 0x7f61fa5381f0>, 'try_use_cuda_graph_with_max_batch_size': <instancemethod try_use_cuda_graph_with_max_batch_size at 0x7f61fa538250>, 'try_graph_capture_with_max_batch_size': <instancemethod try_graph_capture_with_max_batch_size at 0x7f61fa5382b0>})

for 0.6.0.dev0
mappingproxy({'init': <instancemethod init at 0x7fb2dd840220>, 'doc': None, 'module': 'onnxruntime_genai.onnxruntime_genai', 'pybind11_conduit_v1': <instancemethod pybind11_conduit_v1 at 0x7fb2dd840190>, 'pad_token_id': <property object at 0x7fb2dd82b1f0>, 'eos_token_id': <property object at 0x7fb2dd82b290>, 'vocab_size': <property object at 0x7fb2dd82b330>, 'whisper_input_features': <property object at 0x7fb2dd82b420>, 'alignment_heads': <property object at 0x7fb2dd82b560>, 'set_inputs': <instancemethod set_inputs at 0x7fb2dd840490>, 'set_model_input': <instancemethod set_model_input at 0x7fb2dd8404f0>, 'set_search_options': <instancemethod set_search_options at 0x7fb2dd840550>, 'try_use_cuda_graph_with_max_batch_size': <instancemethod try_use_cuda_graph_with_max_batch_size at 0x7fb2dd8405b0>, 'try_graph_capture_with_max_batch_size': <instancemethod try_graph_capture_with_max_batch_size at 0x7fb2dd840610>})

Thanks

ajindal1 · 2025-01-16T21:54:15Z

@yufang67 We did have an API change and it was also mentioned here, there are two ways to fix this:

Build from source using the main branch and use the example from main branch (it will contain the fix for Phi3 garbage output for long outputs - 4K tokens or more)
Use stable releases and also get the example from the release branch (it would NOT contain the fix for Phi3 error).

natke added the waiting-for-customer label Oct 3, 2024

natke removed the waiting-for-customer label Oct 7, 2024

natke assigned ajindal1 Oct 15, 2024

natke added the model quality label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phi3.5 genai converted model output garbage results with input length around 3000 and 8000. #954

phi3.5 genai converted model output garbage results with input length around 3000 and 8000. #954

yufang67 commented Oct 3, 2024

natke commented Oct 3, 2024

yufang67 commented Oct 4, 2024

natke commented Oct 4, 2024

yufang67 commented Oct 4, 2024

natke commented Oct 7, 2024

natke commented Oct 15, 2024

yufang67 commented Oct 21, 2024

ajindal1 commented Oct 21, 2024

yufang67 commented Oct 24, 2024

yufang67 commented Nov 4, 2024

ajindal1 commented Nov 7, 2024

ajindal1 commented Nov 15, 2024

yufang67 commented Nov 18, 2024

yufang67 commented Nov 22, 2024

ajindal1 commented Nov 22, 2024

yufang67 commented Dec 9, 2024

ajindal1 commented Dec 16, 2024

ajindal1 commented Dec 16, 2024

yufang67 commented Dec 31, 2024

ajindal1 commented Dec 31, 2024

ajindal1 commented Jan 8, 2025

yufang67 commented Jan 9, 2025 •

edited

Loading

ajindal1 commented Jan 9, 2025

yufang67 commented Jan 16, 2025

ajindal1 commented Jan 16, 2025

yufang67 commented Jan 16, 2025 •

edited

Loading

ajindal1 commented Jan 16, 2025 •

edited

Loading

phi3.5 genai converted model output garbage results with input length around 3000 and 8000. #954

phi3.5 genai converted model output garbage results with input length around 3000 and 8000. #954

Comments

yufang67 commented Oct 3, 2024

natke commented Oct 3, 2024

yufang67 commented Oct 4, 2024

natke commented Oct 4, 2024

yufang67 commented Oct 4, 2024

natke commented Oct 7, 2024

natke commented Oct 15, 2024

yufang67 commented Oct 21, 2024

ajindal1 commented Oct 21, 2024

yufang67 commented Oct 24, 2024

yufang67 commented Nov 4, 2024

ajindal1 commented Nov 7, 2024

ajindal1 commented Nov 15, 2024

yufang67 commented Nov 18, 2024

yufang67 commented Nov 22, 2024

ajindal1 commented Nov 22, 2024

yufang67 commented Dec 9, 2024

ajindal1 commented Dec 16, 2024

ajindal1 commented Dec 16, 2024

yufang67 commented Dec 31, 2024

ajindal1 commented Dec 31, 2024

ajindal1 commented Jan 8, 2025

yufang67 commented Jan 9, 2025 • edited Loading

ajindal1 commented Jan 9, 2025

yufang67 commented Jan 16, 2025

ajindal1 commented Jan 16, 2025

yufang67 commented Jan 16, 2025 • edited Loading

ajindal1 commented Jan 16, 2025 • edited Loading

yufang67 commented Jan 9, 2025 •

edited

Loading

yufang67 commented Jan 16, 2025 •

edited

Loading

ajindal1 commented Jan 16, 2025 •

edited

Loading