Workaround to get correct results from llama2 mxr #3602

turneram · 2024-11-08T17:33:23Z

codecov · 2024-11-08T19:06:50Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.13%. Comparing base (4b96e1c) to head (3469919).
Report is 4 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #3602   +/-   ##
========================================
  Coverage    92.13%   92.13%           
========================================
  Files          512      512           
  Lines        21424    21424           
========================================
  Hits         19740    19740           
  Misses        1684     1684

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

migraphx-bot · 2024-11-08T22:12:19Z

Test	Batch	Rate new 346991	Rate old 4b96e1	Diff	Compare
torchvision-resnet50	64	3,259.45	3,260.40	-0.03%	✅
torchvision-resnet50_fp16	64	6,992.07	6,981.88	0.15%	✅
torchvision-densenet121	32	2,436.39	2,436.50	-0.00%	✅
torchvision-densenet121_fp16	32	4,089.04	4,081.96	0.17%	✅
torchvision-inceptionv3	32	1,638.45	1,638.04	0.03%	✅
torchvision-inceptionv3_fp16	32	2,762.22	2,760.86	0.05%	✅
cadene-inceptionv4	16	776.94	776.56	0.05%	✅
cadene-resnext64x4	16	810.97	811.67	-0.09%	✅
slim-mobilenet	64	7,535.10	7,540.50	-0.07%	✅
slim-nasnetalarge	64	211.47	211.49	-0.01%	✅
slim-resnet50v2	64	3,505.29	3,506.73	-0.04%	✅
bert-mrpc-onnx	8	1,150.10	1,147.08	0.26%	✅
bert-mrpc-tf	1	463.30	465.87	-0.55%	✅
pytorch-examples-wlang-gru	1	427.90	423.73	0.98%	✅
pytorch-examples-wlang-lstm	1	397.41	389.07	2.14%	✅
torchvision-resnet50_1	1	780.07	788.22	-1.03%	✅
cadene-dpn92_1	1	400.03	402.19	-0.54%	✅
cadene-resnext101_1	1	383.01	382.83	0.05%	✅
onnx-taau-downsample	1	342.65	343.07	-0.12%	✅
dlrm-criteoterabyte	1	33.32	33.34	-0.04%	✅
dlrm-criteoterabyte_fp16	1	52.74	52.75	-0.01%	✅
agentmodel	1	8,360.30	8,325.15	0.42%	✅
unet_fp16	2	58.79	58.80	-0.01%	✅
resnet50v1_fp16	1	955.34	953.06	0.24%	✅
resnet50v1_int8	1	1,053.66	1,005.99	4.74%	🔆
bert_base_cased_fp16	64	1,170.92	1,170.44	0.04%	✅
bert_large_uncased_fp16	32	363.44	363.37	0.02%	✅
bert_large_fp16	1	200.61	198.99	0.81%	✅
distilgpt2_fp16	16	2,203.46	2,201.23	0.10%	✅
yolov5s	1	535.78	536.00	-0.04%	✅
tinyllama	1	43.44	43.45	-0.01%	✅
vicuna-fastchat	1	178.64	174.10	2.61%	✅
whisper-tiny-encoder	1	418.21	418.74	-0.13%	✅
whisper-tiny-decoder	1	427.95	425.97	0.47%	✅

Check results before merge 🔆

migraphx-bot · 2024-11-08T22:12:20Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

CharlieL7

I get that there's an issue with the data pointers. I don't follow how this workaround fixes it. We should meet to discuss.

pfultz2 · 2024-11-11T23:08:37Z

src/targets/gpu/prefuse_ops.cpp


-        std::vector<instruction_ref> attn_probs_inputs{id, pres_k, pres_v, inputs.at(5)};
+        std::vector<instruction_ref> attn_probs_inputs{concat, pres_k, pres_v, inputs.at(5), rotary_qkv};


Why cant you do:

auto id = mpm.get_module().insert_instruction(ins, make_op("identity"), concat, rotary_qkv); std::vector<instruction_ref> attn_probs_inputs{id, pres_k, pres_v, inputs.at(5)};

?

pfultz2 · 2024-11-11T23:14:02Z

src/targets/gpu/kernels/include/migraphx/kernels/compute_attention_probabilities.hpp

          class Params>
 __device__ void compute_attention_probabilities(Output output,
-                                                Query query,
+                                                Passthrough, // Used for shape info and graph ordering only


I dont really follow how these parameters match what is done in prefuse_ops. You dont really change the first parameter but somehow this is changed here. Is the query input supposed to be the concat or the rotary_qkv?

turneram added 2 commits November 8, 2024 10:06

Add workaround for mxr issue

10a1610

Merge remote-tracking branch 'origin/develop' into mxr-workaround

3469919

turneram requested a review from causten as a code owner November 8, 2024 17:33

causten requested review from pfultz2 and CharlieL7 November 8, 2024 18:32

CharlieL7 reviewed Nov 11, 2024

View reviewed changes

pfultz2 reviewed Nov 11, 2024

View reviewed changes

turneram closed this Nov 12, 2024

pfultz2 deleted the mxr-workaround branch November 12, 2024 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workaround to get correct results from llama2 mxr #3602

Workaround to get correct results from llama2 mxr #3602

turneram commented Nov 8, 2024 •

edited

Loading

codecov bot commented Nov 8, 2024 •

edited

Loading

migraphx-bot commented Nov 8, 2024

migraphx-bot commented Nov 8, 2024

CharlieL7 left a comment

pfultz2 Nov 11, 2024

pfultz2 Nov 11, 2024


		std::vector<instruction_ref> attn_probs_inputs{id, pres_k, pres_v, inputs.at(5)};
		std::vector<instruction_ref> attn_probs_inputs{concat, pres_k, pres_v, inputs.at(5), rotary_qkv};

Workaround to get correct results from llama2 mxr #3602

Workaround to get correct results from llama2 mxr #3602

Conversation

turneram commented Nov 8, 2024 • edited Loading

codecov bot commented Nov 8, 2024 • edited Loading

Codecov Report

migraphx-bot commented Nov 8, 2024

migraphx-bot commented Nov 8, 2024

CharlieL7 left a comment

Choose a reason for hiding this comment

pfultz2 Nov 11, 2024

Choose a reason for hiding this comment

pfultz2 Nov 11, 2024

Choose a reason for hiding this comment

turneram commented Nov 8, 2024 •

edited

Loading

codecov bot commented Nov 8, 2024 •

edited

Loading