-
Notifications
You must be signed in to change notification settings - Fork 47
Add self-detecting on-the-fly bfloat16->float16 conversion pass #741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ovep-develop
Are you sure you want to change the base?
Add self-detecting on-the-fly bfloat16->float16 conversion pass #741
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am more aligned with this change.
a02a919
to
c594c4d
Compare
@@ -453,6 +465,16 @@ BackendManager::GetModelProtoFromFusedNode(const onnxruntime::Node& fused_node, | |||
DumpOpenVINOEPModel(onnx_model_path_name, model_proto.get(), fused_node); | |||
ORT_ENFORCE(status.IsOK(), status.ErrorMessage()); | |||
return model_proto; | |||
} else if (HasBf16(subgraph)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a check needed for enable_qdq_optimizer . Should you check for GPU here ? Please let me know if you support ep context graphs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily, this is a universal pass, which works for all the IPs.
UPD to an edited comment: this is an else
condition for all the qdq_scales-related graph modifications. Overall, qdq_scales and bfloat16 are mutually exclusive, so the current logic is the following: if qdq_scaling pass is requested, we go there with two different paths for NPU and GPU. Else, if the model has bfloat16 initializers, we convert them to fp16 in this pass. Otherwise, we just transfer the model directly to openvino.
Regarding ep context graphs: no, they're not supported, since they're basically an encapsulated OVIR and we can only redirect it to OV, nothing more. So if there is a request from a customer to work with bfloat16 ep context models, we'd need to solve it on the OV side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good. Please look at review comments and see if you have subscribed to Coding Style.
Please update branch. |
@mklimenk |
A follow-up to #740 with changed logic. Instead of relying on an external configuration key, perform bfloat16->float16 conversion in case there is at least one tensor in bfloat16 in the model.
https://jira.devtools.intel.com/browse/CVS-170592