Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround to get correct results from llama2 mxr #3602

Closed
wants to merge 2 commits into from

Conversation

turneram
Copy link
Contributor

@turneram turneram commented Nov 8, 2024

Fixes #3596

@turneram turneram requested a review from causten as a code owner November 8, 2024 17:33
@causten causten requested review from pfultz2 and CharlieL7 November 8, 2024 18:32
Copy link

codecov bot commented Nov 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.13%. Comparing base (4b96e1c) to head (3469919).
Report is 4 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #3602   +/-   ##
========================================
  Coverage    92.13%   92.13%           
========================================
  Files          512      512           
  Lines        21424    21424           
========================================
  Hits         19740    19740           
  Misses        1684     1684           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@migraphx-bot
Copy link
Collaborator

Test Batch Rate new
346991
Rate old
4b96e1
Diff Compare
torchvision-resnet50 64 3,259.45 3,260.40 -0.03%
torchvision-resnet50_fp16 64 6,992.07 6,981.88 0.15%
torchvision-densenet121 32 2,436.39 2,436.50 -0.00%
torchvision-densenet121_fp16 32 4,089.04 4,081.96 0.17%
torchvision-inceptionv3 32 1,638.45 1,638.04 0.03%
torchvision-inceptionv3_fp16 32 2,762.22 2,760.86 0.05%
cadene-inceptionv4 16 776.94 776.56 0.05%
cadene-resnext64x4 16 810.97 811.67 -0.09%
slim-mobilenet 64 7,535.10 7,540.50 -0.07%
slim-nasnetalarge 64 211.47 211.49 -0.01%
slim-resnet50v2 64 3,505.29 3,506.73 -0.04%
bert-mrpc-onnx 8 1,150.10 1,147.08 0.26%
bert-mrpc-tf 1 463.30 465.87 -0.55%
pytorch-examples-wlang-gru 1 427.90 423.73 0.98%
pytorch-examples-wlang-lstm 1 397.41 389.07 2.14%
torchvision-resnet50_1 1 780.07 788.22 -1.03%
cadene-dpn92_1 1 400.03 402.19 -0.54%
cadene-resnext101_1 1 383.01 382.83 0.05%
onnx-taau-downsample 1 342.65 343.07 -0.12%
dlrm-criteoterabyte 1 33.32 33.34 -0.04%
dlrm-criteoterabyte_fp16 1 52.74 52.75 -0.01%
agentmodel 1 8,360.30 8,325.15 0.42%
unet_fp16 2 58.79 58.80 -0.01%
resnet50v1_fp16 1 955.34 953.06 0.24%
resnet50v1_int8 1 1,053.66 1,005.99 4.74% 🔆
bert_base_cased_fp16 64 1,170.92 1,170.44 0.04%
bert_large_uncased_fp16 32 363.44 363.37 0.02%
bert_large_fp16 1 200.61 198.99 0.81%
distilgpt2_fp16 16 2,203.46 2,201.23 0.10%
yolov5s 1 535.78 536.00 -0.04%
tinyllama 1 43.44 43.45 -0.01%
vicuna-fastchat 1 178.64 174.10 2.61%
whisper-tiny-encoder 1 418.21 418.74 -0.13%
whisper-tiny-decoder 1 427.95 425.97 0.47%

Check results before merge 🔆

@migraphx-bot
Copy link
Collaborator


     ✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

     ✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

     ✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

     ✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

     ✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

     ✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

     ✅ agentmodel: PASSED: MIGraphX meets tolerance

     ✅ unet: PASSED: MIGraphX meets tolerance

     ✅ resnet50v1: PASSED: MIGraphX meets tolerance

     ✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ bert_large: PASSED: MIGraphX meets tolerance

     ✅ yolov5s: PASSED: MIGraphX meets tolerance

     ✅ tinyllama: PASSED: MIGraphX meets tolerance

     ✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

     ✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

Copy link
Collaborator

@CharlieL7 CharlieL7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that there's an issue with the data pointers. I don't follow how this workaround fixes it. We should meet to discuss.


std::vector<instruction_ref> attn_probs_inputs{id, pres_k, pres_v, inputs.at(5)};
std::vector<instruction_ref> attn_probs_inputs{concat, pres_k, pres_v, inputs.at(5), rotary_qkv};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cant you do:

auto id =  mpm.get_module().insert_instruction(ins, make_op("identity"), concat, rotary_qkv);
std::vector<instruction_ref> attn_probs_inputs{id, pres_k, pres_v, inputs.at(5)};

?

class Params>
__device__ void compute_attention_probabilities(Output output,
Query query,
Passthrough, // Used for shape info and graph ordering only
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont really follow how these parameters match what is done in prefuse_ops. You dont really change the first parameter but somehow this is changed here. Is the query input supposed to be the concat or the rotary_qkv?

@turneram turneram closed this Nov 12, 2024
@pfultz2 pfultz2 deleted the mxr-workaround branch November 12, 2024 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GroupQueryAttention produces incorrect results when loaded from mxr
4 participants