HF model tracker #899

pdhirajkumarprasad · 2025-01-09T08:51:19Z

Total no. of models	545
PASS	307 -> 408
Numeric	12 -> 37
compilation
compiled_inference
setup and import

Detailed list

amd-vivekag · 2025-02-13T09:54:51Z

Passing Summary

TOTAL TESTS = 544

Stage	# Passing	% of Total	% of Attempted
Setup	532	97.8%	97.8%
IREE Compilation	457	84.0%	85.9%
Gold Inference	451	82.9%	98.7%
IREE Inference Invocation	445	81.8%	98.7%
Inference Comparison (PASS)	406	74.6%	91.2%

Fail Summary

TOTAL TESTS = 544

Stage	# Failed at Stage	% of Total
Setup	12	2.2%
IREE Compilation	75	13.8%
Gold Inference	6	1.1%
IREE Inference Invocation	6	1.1%
Inference Comparison	39	7.2%

GIST containing all the failures: https://gist.github.com/amd-vivekag/377a7b141b40c118f880b2ced176f95c

Following issues failing in CPU:

#	Issue type	Issue Message	Issue no	#Model impacted	List of model
1	setup	ImportError("Loading an AWQ quantized model requires auto-awq library (`pip install autoawq`)	918	2	hf_Midnight-Miqu-70B-v1.5-4bit, hf_Meta-Llama-3.1-8B-Instruct-AWQ-INT4
3	setup	IndexError: index out of range in self	920	1	hf_ruRoPEBert-e5-base-2k
5	setup	importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes	922	1	hf_Meta-Llama-3.1-8B-Instruct-bnb-4bit
6	setup	RuntimeError: Error(s) in loading state_dict for DebertaV2ForMultipleChoice:	923	1	hf_fine-tuned-MoritzLaurer-deberta-v3-large-zeroshot-v2.0-arceasy
7	setup	TypeError: DisableCompileContextManager.enter....() got an unexpected keyword argument 'dtype'	924	1	hf_Llama3-8B-1.58-100B-tokens-GGUF
8	setup	torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::bitwise_and' to ONNX opset version 14 is not supported	925	1	hf_Mistral-7B-Instruct-v0.2-GPTQ
12	import_model	Assertion `node->outputs().size() < 4` failed	#929	1	hf_nfnet_l0.ra2_in1k
13	compilation	error: failed to legalize operation 'torch.operator' that was explicitly marked illegal (onnx.If return type issue)	#930	45	hf_1_microsoft_deberta_V1.0, hf_1_microsoft_deberta_V1.1, hf_checkpoints_10_1_microsoft_deberta_V1.1_384, hf_checkpoints_1_16, hf_checkpoints_26_9_microsoft_deberta_21_9, hf_checkpoints_28_9_microsoft_deberta_V2, hf_checkpoints_28_9_microsoft_deberta_V4, hf_checkpoints_28_9_microsoft_deberta_V5, hf_checkpoints_29_9_microsoft_deberta_V1, hf_checkpoints_30_9_microsoft_deberta_V1.0_384, hf_checkpoints_3_14, hf_content, hf_deberta-base, hf_deberta_finetuned_pii, hf_deberta-large-mnli, hf_Debertalarg_model_multichoice_Version2, hf_deberta-v2-base-japanese, hf_deberta-v2-base-japanese-char-wwm, hf_deberta-v3-base, hf_deberta-v3-base-absa-v1.1, hf_deberta-v3-base_finetuned_ai4privacy_v2, hf_deberta-v3-base-injection, hf_DeBERTa-v3-base-mnli-fever-anli, hf_deberta-v3-base-squad2, hf_deberta-v3-base-zeroshot-v1.1-all-33, hf_deberta-v3-large, hf_deberta-v3-large_boolq, hf_deberta-v3-large-squad2, hf_deberta-v3-large_test, hf_deberta-v3-large_test_9e-6, hf_deberta-v3-small, hf_deberta-v3-xsmall, hf_llm-mdeberta-v3-swag, hf_mdeberta-v3-base, hf_mDeBERTa-v3-base-mnli-xnli, hf_mdeberta-v3-base-squad2, hf_mDeBERTa-v3-xnli-ft-bs-multiple-choice, hf_Medical-NER, hf_mxbai-rerank-base-v1, hf_mxbai-rerank-xsmall-v1, hf_nli-deberta-v3-base, hf_output, hf_piiranha-v1-detect-personal-information, hf_splinter-base, hf_splinter-base-qass
14	compilation	error: failed to legalize unresolved materialization from ('i64') to ('index') that remained live after conversion	iree-org/iree#18899	3	hf_deeplabv3-mobilevit-small, hf_deeplabv3-mobilevit-xx-small, hf_mobilevit-small
15	compilation	error: 'flow.dispatch.workgroups' op value set has 3 dynamic dimensions but only 2 dimension values are attached	iree-org/iree#20154	3	hf_beit-base-patch16-224-pt22k, hf_beit-base-patch16-224-pt22k-ft22k, hf_pedestrian_gender_recognition
16	compilation	error: expected sizes to be non-negative, but got -1	iree-org/iree#19501	7	hf_swin_base_patch4_window7_224.ms_in22k_ft_in1k, hf_swin-tiny-patch4-window7-224, hf_yolos-base, hf_yolos-fashionpedia, hf_yolos-small, hf_yolos-small-finetuned-license-plate-detection, hf_yolos-small-rego-plates-detection
17	compilation	error: 'stream.async.dispatch' op has invalid Read access range	iree-org/iree#20155	1	hf_dpt-large-ade
18	compilation	error: 'iree_linalg_ext.pack' op write affecting operations on global resources are restricted to workgroup distributed contexts.	iree-org/iree#20156	1	hf_distilhubert
19	compilation	error: expected offsets to be non-negative, but got -1	iree-org/iree#19935	1	hf_pnasnet5large.tf_in1k
23	native_inference	[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: pixel_values for the following indices	#941	1	hf_mobilenet_v1_0.75_192
24	native_inference	[ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Add node	#942	1	hf_eva_large_patch14_196.in22k_ft_in22k_in1k
26	compiled_inference	:0: FAILED_PRECONDITION; onnx.Expand input has a dim that is not statically 1	#944	2	hf_phobert-base-finetuned, hf_phobert-large-finetuned

Following issues resolved:

#	Issue type	Issue Message	Issue no	#Model impacted	List of model	Assignee	Status
2	setup	requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url	919	3	hf_Multiple_Choice, hf_multiple_choice_model, hf_Multiple_Choice_EN	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#456
4	setup	Unknown task: fill-mask	921	2	hf_multi-qa-mpnet-base-cos-v1, hf_all-mpnet-base-v1	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#456
9	import_model	Killed due to OOM	#926	1	hf_StableBeluga2	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#451
10	import_model	assertNonNull: Assertion `g.get() != nullptr` failed	#927	5	hf_esm2_t36_3B_UR50D, hf_Phi-3.5-mini-instruct, hf_Phi-3-mini-128k-instruct, hf_Phi-3-mini-4k-instruct, hf_zephyr-7b-beta	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#451
11	import_model	assertInVersionRange: Assertion `version >= version_range.first && version <= version_range.second` failed	#928	8	hf_llama-7b, hf_oasst-sft-4-pythia-12b-epoch-3.5, hf_Qwen2.5-1.5B-Instruct, hf_Qwen2.5-7B-Instruct, hf_Qwen2-7B-Instruct, hf_TinyLlama-1.1B-Chat-v1.0, hf_vicuna-7b-v1.5, hf_wasmai-7b-v1	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#451
20	construct_inputs	ValueError: Asking to pad but the tokenizer does not have a padding token	#938	4	hf_distilgpt2, hf_gpt2, hf_llama-68m, hf_tiny-random-mistral	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#451
21	construct_inputs	name 'tokens' is not defined	#939	1	hf_wavlm-base-plus	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#442
22	native_inference	IndexError: tuple index out of range	#940	14	hf_bart-base, hf_gpt2-small-spanish, hf_ivila-row-layoutlm-finetuned-s2vl-v2, hf_opt-125m, hf_Qwen1.5-0.5B-Chat, hf_Qwen2-0.5B, hf_Qwen2.5-0.5B-Instruct, hf_really-tiny-falcon-testing, hf_tiny-dummy-qwen2, hf_tiny-Qwen2ForCausalLM-2.5, hf_tiny-random-GemmaForCausalLM, hf_tiny-random-LlamaForCausalLM, hf_tiny-random-mt5, hf_tiny-random-Phi3ForCausalLM	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#447
25	compiled_inference	INVALID_ARGUMENT; function expected fewer input values; parsing input `input.bin`	#943	4	hf_ko-sroberta-multitask, hf_robertuito-sentiment-analysis, hf_sbert_large_nlu_ru, hf_sentence-bert-base-ja-mean-tokens-v2	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#453

zjgarvey · 2025-02-13T17:09:02Z

I assume the most recent run is on CPU? Can you share the detail table in a gist? Can you also post the IREE version?

amd-vivekag · 2025-02-13T17:44:15Z

I assume the most recent run is on CPU? Can you share the detail table in a gist? Can you also post the IREE version?

Yes, these are run on CPU. I was getting more failures (around 40 more failures on GPU). I'm using following IREE version:

IREE (https://iree.dev):
  IREE compiler version 3.2.0rc20250206 @ f3bef2de123f08b4fc3b0ce691494891bd6760d0
  LLVM version 20.0.0git
  Optimized build

Following is the detailed table link:
https://gist.github.com/amd-vivekag/377a7b141b40c118f880b2ced176f95c

pdhirajkumarprasad · 2025-02-27T09:50:37Z

Here is latest status on HF model https://gist.github.com/pdhirajkumarprasad/784eee989d6935d1074c217de2040477 we should focus on 6/7 issues that mentioned on this.

@amd-vivekag, please list the issue number for the issue mentioned on above page

@zjgarvey we need to focus on these and let's try to get clean by next week so that we are in good shape w.r.t HF models

pdhirajkumarprasad mentioned this issue Jan 9, 2025

[Tracker] All the issue related with e2e shark test suite #812

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HF model tracker #899

HF model tracker #899

pdhirajkumarprasad commented Jan 9, 2025 •

edited

Loading

amd-vivekag commented Feb 13, 2025 •

edited

Loading

zjgarvey commented Feb 13, 2025 •

edited

Loading

amd-vivekag commented Feb 13, 2025

pdhirajkumarprasad commented Feb 27, 2025

HF model tracker #899

HF model tracker #899

Comments

pdhirajkumarprasad commented Jan 9, 2025 • edited Loading

amd-vivekag commented Feb 13, 2025 • edited Loading

Passing Summary

Fail Summary

zjgarvey commented Feb 13, 2025 • edited Loading

amd-vivekag commented Feb 13, 2025

pdhirajkumarprasad commented Feb 27, 2025

pdhirajkumarprasad commented Jan 9, 2025 •

edited

Loading

amd-vivekag commented Feb 13, 2025 •

edited

Loading

zjgarvey commented Feb 13, 2025 •

edited

Loading