Enabled VLMs via CLI #287

asmigosw · 2025-02-27T06:23:25Z

Added support for enabling VLMs via CLI.

Sample command:

python -m QEfficient.cloud.infer --model_name meta-llama/Llama-3.2-11B-Vision-Instruct 
--batch_size 1 --prompt_len 32 --ctx_len 512 --num_cores 16 --device_group [0]
--prompt "Descrive the image?" --mos 1  --allocator_dealloc_delay 1 
--image_url https://i.etsystatic.com/8155076/r/il/0825c2/1594869823/il_fullxfull.1594869823_5x0w.jpg

QEfficient/base/common.py

QEfficient/transformers/models/modeling_auto.py

QEfficient/cloud/infer.py

QEfficient/base/common.py

QEfficient/transformers/models/modeling_auto.py

QEfficient/cloud/infer.py

quic-amitraj

LGTM

Removing onnx_defer_loading flag which was originally removed in _[Removed onnx_defer_loading from Immutable Convertor Args. PR: 230]_ but got added back later in _[Mllama(single + dual) + InternVL(single) + Llava (single) PR: 267]_ maybe becausing of rebasing. Signed-off-by: Shubham Agrawal <[email protected]> Signed-off-by: Asmita Goswami <[email protected]>

This will create a config JSON file, which contains all the details about compilation and SDK versions. Currently, this code is added in the code block of QEFFAutoModelForCausalLM.compile. The config would look like below: ``` { "huggingface_config": { "vocab_size": 50257, "n_positions": 1024, "n_embd": 768, "n_layer": 12, "n_head": 12, "n_inner": null, "activation_function": "gelu_new", "resid_pdrop": 0.1, "embd_pdrop": 0.1, "attn_pdrop": 0.1, "layer_norm_epsilon": 1e-05, "initializer_range": 0.02, "summary_type": "cls_index", "summary_use_proj": true, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "scale_attn_weights": true, "use_cache": true, "scale_attn_by_inverse_layer_idx": false, "reorder_and_upcast_attn": false, "bos_token_id": 50256, "eos_token_id": 50256, "return_dict": true, "output_hidden_states": false, "output_attentions": false, "torchscript": false, "torch_dtype": null, "use_bfloat16": false, "tf_legacy_loss": false, "pruned_heads": {}, "tie_word_embeddings": true, "chunk_size_feed_forward": 0, "is_encoder_decoder": false, "is_decoder": false, "cross_attention_hidden_size": null, "add_cross_attention": false, "tie_encoder_decoder": false, "max_length": 20, "min_length": 0, "do_sample": false, "early_stopping": false, "num_beams": 1, "num_beam_groups": 1, "diversity_penalty": 0.0, "temperature": 1.0, "top_k": 50, "top_p": 1.0, "typical_p": 1.0, "repetition_penalty": 1.0, "length_penalty": 1.0, "no_repeat_ngram_size": 0, "encoder_no_repeat_ngram_size": 0, "bad_words_ids": null, "num_return_sequences": 1, "output_scores": false, "return_dict_in_generate": false, "forced_bos_token_id": null, "forced_eos_token_id": null, "remove_invalid_values": false, "exponential_decay_length_penalty": null, "suppress_tokens": null, "begin_suppress_tokens": null, "architectures": [ "GPT2LMHeadModel" ], "finetuning_task": null, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "tokenizer_class": null, "prefix": null, "pad_token_id": null, "sep_token_id": null, "decoder_start_token_id": null, "task_specific_params": { "text-generation": { "do_sample": true, "max_length": 50 } }, "problem_type": null, "_name_or_path": "gpt2", "_commit_hash": "607a30d783dfa663caf39e06633721c8d4cfcd7e", "_attn_implementation_internal": "eager", "transformers_version": null, "model_type": "gpt2", "n_ctx": 1024 }, "qpc_config": { "QEff_config": { "pytorch_transforms": [ "AwqToMatmulNbitsTransform", "GPTQToMatmulNbitsTransform", "CustomOpsTransform", "KVCacheTransform" ], "onnx_transforms": [ "FP16ClipTransform", "SplitTensorsTransform" ], "onnx_path": "/root/.cache/qeff_models/GPT2LMHeadModel-36f0eca92731bb47/GPT2LMHeadModel.onnx" }, "aic_compiler_config": { "apps_sdk_version": "1.20.0", "compile_dir": "/root/.cache/qeff_models/GPT2LMHeadModel-36f0eca92731bb47", "specializtions_file_path": "/root/.cache/qeff_models/GPT2LMHeadModel-36f0eca92731bb47/specializations.json", "prefill_seq_len": 32, "ctx_len": 128, "batch_size": 1, "full_batch_size": null, "num_devices": 1, "num_cores": 16, "mxfp6_matmul": false, "mxint8_kv_cache": false, "num_speculative_tokens": null }, "qnn_config": { "enable_qnn": true, "qnn_config_path": "QEfficient/compile/qnn_config.json", "product": "QAIRT", "os": { "Ubuntu": 22.04, "Windows": 11 }, "sdk_flavor": [ "aic" ], "version": "2.31.0", "build_id": "250109072054_3882", "qnn_backend_api_version": "2.18.0", "tensorflow": "2.10.1", "tflite": "2.3.0", "torch": "1.13.1", "onnx": "1.16.1", "onnxruntime": "1.17.1", "onnxsimplifier": "0.4.36", "android-ndk": "r26c", "platform": "AIC.1.20.0.14" } } } ``` Note: The code structure may change. --------- Signed-off-by: Abukhoyer Shaik <[email protected]> Signed-off-by: Asmita Goswami <[email protected]>

… validation page (quic#303) Signed-off-by: Abukhoyer Shaik <[email protected]> Signed-off-by: Asmita Goswami <[email protected]>

This is just small fixes done for printing the `QEFFAutoModelForCausalLM`'s instance by changing the `__repr__(self)` method. Signed-off-by: Abukhoyer Shaik <[email protected]> Signed-off-by: Asmita Goswami <[email protected]>

Signed-off-by: Asmita Goswami <[email protected]>

QEfficient/cloud/infer.py

Signed-off-by: Asmita Goswami <[email protected]>

asmigosw requested review from quic-rishinr and ochougul as code owners February 27, 2025 06:23

quic-amitraj self-requested a review February 27, 2025 06:24

ochougul requested changes Feb 27, 2025

View reviewed changes

quic-amitraj requested changes Feb 27, 2025

View reviewed changes

QEfficient/transformers/models/modeling_auto.py Outdated Show resolved Hide resolved

QEfficient/cloud/infer.py Outdated Show resolved Hide resolved

QEfficient/cloud/infer.py Outdated Show resolved Hide resolved

QEfficient/cloud/infer.py Outdated Show resolved Hide resolved

quic-hemagnih reviewed Feb 27, 2025

View reviewed changes

QEfficient/base/common.py Outdated Show resolved Hide resolved

quic-amitraj reviewed Feb 28, 2025

View reviewed changes

QEfficient/base/common.py Outdated Show resolved Hide resolved

vbaddi assigned asmigosw Feb 28, 2025

quic-amitraj requested changes Mar 3, 2025

View reviewed changes

QEfficient/transformers/models/modeling_auto.py Outdated Show resolved Hide resolved

QEfficient/cloud/infer.py Outdated Show resolved Hide resolved

QEfficient/cloud/infer.py Outdated Show resolved Hide resolved

QEfficient/cloud/infer.py Outdated Show resolved Hide resolved

quic-amitraj approved these changes Mar 3, 2025

View reviewed changes

quic-rishinr requested a review from ochougul March 3, 2025 08:54

asmigosw force-pushed the image_text_support branch from 9567658 to ad06845 Compare March 3, 2025 11:50

asmigosw mentioned this pull request Mar 4, 2025

Enabled VLMs via CLI on v1.19.3 #297

Closed

asmigosw force-pushed the image_text_support branch from bc60d47 to 76e863a Compare March 6, 2025 06:02

asmigosw marked this pull request as draft March 6, 2025 06:08

asmigosw force-pushed the image_text_support branch from 76e863a to 8d99a93 Compare March 10, 2025 07:26

asmigosw marked this pull request as ready for review March 10, 2025 07:36

shubhagr-quic and others added 10 commits March 10, 2025 07:43

Docs string added for the Image class and granite models are added in…

687d44f

… validation page (quic#303) Signed-off-by: Abukhoyer Shaik <[email protected]> Signed-off-by: Asmita Goswami <[email protected]>

Enabled VLMs via CLI

691cca4

Signed-off-by: Asmita Goswami <[email protected]>

Addressing comments

ea8555d

Signed-off-by: Asmita Goswami <[email protected]>

Removed importlib

5ea6f1c

Signed-off-by: Asmita Goswami <[email protected]>

Addressing comments

561142b

Signed-off-by: Asmita Goswami <[email protected]>

Addressing comments

d9dc7d2

Signed-off-by: Asmita Goswami <[email protected]>

Resolved merge conflicts

1608804

Signed-off-by: Asmita Goswami <[email protected]>

asmigosw force-pushed the image_text_support branch from 3165896 to 1608804 Compare March 10, 2025 07:48

quic-rishinr requested a review from quic-amitraj March 12, 2025 05:19

quic-hemagnih reviewed Mar 12, 2025

View reviewed changes

QEfficient/cloud/infer.py Outdated Show resolved Hide resolved

quic-hemagnih reviewed Mar 12, 2025

View reviewed changes

QEfficient/cloud/infer.py Outdated Show resolved Hide resolved

quic-swatia force-pushed the main branch from 32651d5 to cd9a6b9 Compare March 20, 2025 09:58

Addressed Comments

ca55d42

Signed-off-by: Asmita Goswami <[email protected]>

asmigosw force-pushed the image_text_support branch from b08404d to ca55d42 Compare April 2, 2025 08:01

asmigosw added 2 commits April 2, 2025 08:02

Ruff check and format

7a4d18e

Signed-off-by: Asmita Goswami <[email protected]>

Merge branch 'main' into image_text_support

999d13a

quic-amitraj added 1.20.0 ready for review labels Apr 4, 2025

Merge branch 'main' into image_text_support

ecff750

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabled VLMs via CLI #287

Enabled VLMs via CLI #287

asmigosw commented Feb 27, 2025 •

edited by vbaddi

Loading

quic-amitraj left a comment

Enabled VLMs via CLI #287

Are you sure you want to change the base?

Enabled VLMs via CLI #287

Conversation

asmigosw commented Feb 27, 2025 • edited by vbaddi Loading

quic-amitraj left a comment

Choose a reason for hiding this comment

asmigosw commented Feb 27, 2025 •

edited by vbaddi

Loading