-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Use VCL API to do offline compilation #30732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Use VCL API to do offline compilation #30732
Conversation
973bdc5 to
9979810
Compare
79a430e to
466cbb9
Compare
f6fcd1d to
698dfec
Compare
a23728c to
0d0fd33
Compare
a38e20e to
9708ada
Compare
|
This PR will be closed in a week because of 2 weeks of no activity. |
|
This PR was closed because it has been stalled for 2 week with no activity. |
| driver_compiler_utils::serializeIR(model, | ||
| compilerVersion, | ||
| maxOpsetVersion, | ||
| updatedConfig.isAvailable(ov::intel_npu::use_base_model_serializer.name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_base_model_serializer is set to true by default, which means the older serializer, the one that copies all weights, is used. If we introduce the VCL interface to the CiP path, then we don't want major regressions to happen, so I believe we should set the default for use_base_model_serializer to false only when using the CiP adapter.
One way to do this: create a new function in plugin_compiler_adapter.cpp, maybe something like:
void changeDefaultModelSerializer(FilteredConfig& config) {
if (config.isAvailable(ov::intel_npu::use_base_model_serializer.name()) && !config.hasOpt(ov::intel_npu::use_base_model_serializer.name())) {
config.update({{ov::intel_npu::use_base_model_serializer.name(), "NO"}});
}
}
and call it at the beginning of every method within PluginCompilerAdapter that may need to serialize the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
& fyi: you'll have to modify how use_base_model_serializer is handled after I merge this PR. I'll give you a notification when that happens and I'll try to help you adjust the code.
7b8b902 to
4c0fbb2
Compare
Signed-off-by: Kang, Wenjing <[email protected]>
4c0fbb2 to
b363ff6
Compare
| vcl_symbol_statement(vclLogHandleGetString) | ||
|
|
||
|
|
||
| //unsupported symbols with older ze_loader versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... compiler versions this time.
We will not support compiler versions that do not have these symbols. I think we can remove the weak symbols list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vcl_weak_symbols_list needs to include vclAllocatedExecutableCreateWSOneShot to compile with Weightless. Other symbols have been used by the repository at https://github.com/openvinotoolkit/npu_compiler/releases, so move them to the vcl_symbols_list above.
|
|
||
| _logger.debug("compile end, blob size:%d", allocator.m_vec.size()); | ||
| return NetworkDescription(std::move(allocator.m_vec), std::move(metadata)); | ||
| } else if (usedMajor >= 6 && usedMinor >= 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No plan to support older compiler versions. I think we can remove these branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to support version 7.3 used in https://github.com/openvinotoolkit/npu_compiler/releases, branches with versions lower than 6.1 were ultimately deleted, while versions from 6.1 to 7.4 were retained.
| bool is_option_supported(const std::string& option) const; | ||
|
|
||
| private: | ||
| std::shared_ptr<VCLApi> _vclApi; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually being populated somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If CompilerImpl can get and keep a reference to VCL API in its contrustor, maybe this file should be renamed to compiler_impl.hpp? We should also hide VCLApi from the adapter if we only want it to be accessed through the compiler implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The member std::shared_ptr _vclApi; indicates that vclApi is initialized by creating the VCLCompilerImpl object. This member has now been removed, and vclApi::getInstance() is used to initialize and load the VCL library in VCLCompilerImpl constructor.
| _compiler = load_compiler(libPath); | ||
| _logger.info("Loading PLUGIN compiler"); | ||
| try { | ||
| auto vclCompilerPtr = VCLCompilerImpl::getInstance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order seems wrong here: first get the implementation then the library? Should the compiler implementation get and keep a reference to VCL API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid embedding more VCL-related information (such as symbol table definitions) into the plugin_adapter, it is loaded in the VCLCompilerImpl instead.
The reason for keeping a reference to the VCL API is that when creating the corresponding _compiler, both the object and the reference need to be passed in simultaneously https://github.com/openvinotoolkit/openvino/blob/master/src/inference/dev_api/openvino/runtime/so_ptr.hpp.
_compiler = ov::SoPtr<intel_npu::ICompiler>(ptr, so);
SoPtr(const std::shared_ptr<T>& ptr, const std::shared_ptr<void>& so) : _ptr{ptr}, _so{so} {}
Although SoPtr(const std::shared_ptr<T>& ptr) : _ptr{ptr}, _so{nullptr} {} seems to allow passing nullptr to _so, considering the issue of premature unloading, we still choose to follow the same loading method as MLIR compiler loading.
src/plugins/intel_npu/src/common/include/intel_npu/common/icompiler_adapter.hpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_npu/src/compiler_adapter/include/plugin_compiler_adapter.hpp
Outdated
Show resolved
Hide resolved
…MainNetworkDescriptions size Signed-off-by: Kang, Wenjing <[email protected]>
src/plugins/intel_npu/src/compiler_adapter/src/plugin_compiler_adapter.cpp
Show resolved
Hide resolved
src/plugins/intel_npu/src/compiler_adapter/src/plugin_compiler_adapter.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_npu/src/compiler_adapter/include/compiler_impl.hpp
Outdated
Show resolved
Hide resolved
a54dae0 to
7bb58a7
Compare
| @@ -0,0 +1,130 @@ | |||
| // Copyright (C) 2018-2025 Intel Corporation | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Copyright (C) 2018-2025 Intel Corporation | |
| // Copyright (C) 2025 Intel Corporation |
I think this is the right format since the file is new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update in new commit 32fa2f8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is wrong! It has to be
// Copyright (C) 2018-2025 Intel Corporation
|
|
||
| namespace intel_npu { | ||
|
|
||
| bool isUseBaseModelSerializer(const FilteredConfig& config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you using this only in compiler_impl.cpp? then maybe it is better to keep it there in an anonymous namespace, and avoid adding this signature in the header needlessly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to anonymous namespace in new commit 32fa2f8
| * | ||
| * @param model Both source and target. | ||
| */ | ||
| void storeWeightlessCacheAttribute(const std::shared_ptr<ov::Model>& model); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think you can also move isInitMetadata from both adapters here since you created this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move isInitMetadata to weightless_utils file in new commit 32fa2f8
|
|
||
| // user pass model_serializer_version config | ||
| if (config.isAvailable(ov::intel_npu::model_serializer_version.name()) && | ||
| config.has(ov::intel_npu::use_base_model_serializer.name())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| config.has(ov::intel_npu::use_base_model_serializer.name())) { | |
| config.has(ov::intel_npu::model_serializer_version.name())) { |
think you meant this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, update in new commit 32fa2f8
| ov::intel_npu::ModelSerializerVersion::ALL_WEIGHTS_COPY); | ||
| } | ||
|
|
||
| // vcl serializer method is not set by user, will default to use it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // vcl serializer method is not set by user, will default to use it. | |
| // No VCL serializer was chosen explicitly, will default to the "no weights copy" implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update in new commit 32fa2f8
| _logger.debug("create build flags"); | ||
| buildFlags += driver_compiler_utils::serializeIOInfo(model, true); | ||
| buildFlags += " "; | ||
| buildFlags += driver_compiler_utils::serializeConfig(config, compilerVersion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In both compile & compileWSOneShot, the config should be handled the same. Here, the piece of code that registers the correct value for use_base_model_serializer/model_serializer_version is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update in 32fa2f8
| } | ||
|
|
||
| std::vector<std::shared_ptr<NetworkDescription>> networkDescrs; | ||
| for (uint32_t i = 0; i < allocator.m_vector.size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a auto& blob : allocator.m_vector type of iterator is more adequate here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update in 32fa2f8
| // serializer will be used as the default in the plugin adapter. You need to pass the serializer config; | ||
| // otherwise, you will encounter a deserialization issue within the compiler. | ||
| _logger.warning("Add serializer config"); | ||
| if (updatedConfig.isAvailable(ov::intel_npu::use_base_model_serializer.name())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this kind of stuff should also be moved in a separate function, since there's at least three of them that need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract this part as a public function and place it in an anonymous namespace in 32fa2f8
| mainNetworkDescription = initMainNetworkDescriptions.back(); | ||
| initMainNetworkDescriptions.pop_back(); | ||
| OPENVINO_ASSERT(initMainNetworkDescriptions.size() > 0, | ||
| "The initMainNetworkDescriptions after getting mainNetworkDescription must not be empty!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "The initMainNetworkDescriptions after getting mainNetworkDescription must not be empty!"); | |
| "No init schedules have been returned by the compiler"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update in 32fa2f8
| if (model) { | ||
| mainNetworkMetadata.name = model.value()->get_friendly_name(); | ||
| } else { | ||
| _logger.warning("networkMeta name is empty in parse!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"warning" sounds like too much. Only on the weights separation path we expect a model to be provided here. I'd say "info" is more adequate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update in 32fa2f8
|
|
||
| std::vector<std::shared_ptr<NetworkDescription>> initNetworkDescriptions; | ||
| std::shared_ptr<NetworkDescription> mainNetworkDescription; | ||
| storeWeightlessCacheAttribute(model); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, this call is required only if the model needs to be serialized/deserialized. So, only if we use the VCL interface. After this PR is merged, will we always be using the VCL interface, or can we still fallback to the old flow (I admit I didn't follow the library loading part of the code)? If the latter, then maybe we should move this call to the code dedicated to the VCL interface. All weights separation functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, if the VCL interface is not used, then this call won't break anything, but it will waste some resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move storeWeightlessCacheAttribute to compiler_impl.cpp.
In the short term, it is possible to revert to the original MLIR compiler. The current compiler loading mechanism prioritizes linking to the openvino_intel_npu_compiler (VCL compiler); if the openvino_intel_npu_compiler library file does not exist, it will attempt to load the MLIR compiler. If neither is present, an exception will be thrown directly.
| _logger.debug("Graph initialize start"); | ||
|
|
||
| if (_zeGraphExt == nullptr || _graphDesc._handle == nullptr) { | ||
| if (!config.get<CREATE_EXECUTOR>() || config.get<DEFER_WEIGHTS_LOAD>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this extra if here.
| _logger.info("Failed to use the level zero graph handle: %s. Inference requests for this model are not " | ||
| "allowed. Only exports are available", | ||
| ex.what()); | ||
| } catch (...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this extra catch here?
| try { | ||
| initGraphDesc = _zeGraphExt->getGraphDescriptor(tensor.data(), tensor.get_byte_size()); | ||
| initNetworkMeta = _zeGraphExt->getNetworkMeta(initGraphDesc); | ||
| } catch (...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please align all these catch functions? Btw, what about the name of the model, do we need it here as well?
| } | ||
|
|
||
| void WeightlessGraph::initialize(const Config& config) { | ||
| if (!_zeroInitStruct) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed?
Details:
When compiler type is MLIR, it wilI try to link the VCL library (openvino_intel_npu_compiler.dll or libopenvino_intel_npu_compiler.so) first. if the vcl library does not exist, it will check for the MLIR library. If the MLIR library is linked, it will be used for compilation. If neither exists, using the MLIR as compilerType will fail and throw exception.
Added
npu_driver_compiler.hwith detailed type definitions for VCL handles, enums, and structs, such asvcl_compiler_handle_t,vcl_result_t, andvcl_device_desc_t, to support the VCL API.Added a new
vcl_api.hppfile to define and wrap VCL API functions, enabling dynamic symbol resolution and providing a singleton-based interface for accessing the API.Introduced
VCLCompilerImplclass to implement compiler-related operations using the VCL API, including methods for compilation, querying, and profiling.(update platfrom and device by pass device sting from plugin)Tickets: