Skip to content

Conversation

@XinWangIntel
Copy link
Contributor

@XinWangIntel XinWangIntel commented May 27, 2025

Details:

  • When compiler type is MLIR, it wilI try to link the VCL library (openvino_intel_npu_compiler.dll or libopenvino_intel_npu_compiler.so) first. if the vcl library does not exist, it will check for the MLIR library. If the MLIR library is linked, it will be used for compilation. If neither exists, using the MLIR as compilerType will fail and throw exception.

    • Added npu_driver_compiler.h with detailed type definitions for VCL handles, enums, and structs, such as vcl_compiler_handle_t, vcl_result_t, and vcl_device_desc_t, to support the VCL API.

    • Added a new vcl_api.hpp file to define and wrap VCL API functions, enabling dynamic symbol resolution and providing a singleton-based interface for accessing the API.

    • Introduced VCLCompilerImpl class to implement compiler-related operations using the VCL API, including methods for compilation, querying, and profiling.(update platfrom and device by pass device sting from plugin)

Tickets:

@github-actions github-actions bot added the category: NPU OpenVINO NPU plugin label May 27, 2025
@XinWangIntel XinWangIntel force-pushed the serialize-on-plugin-adapter branch from 973bdc5 to 9979810 Compare May 30, 2025 07:17
@github-actions github-actions bot added the category: build OpenVINO cmake script / infra label Jun 9, 2025
@XinWangIntel XinWangIntel force-pushed the serialize-on-plugin-adapter branch 9 times, most recently from 79a430e to 466cbb9 Compare June 12, 2025 06:03
@XinWangIntel XinWangIntel force-pushed the serialize-on-plugin-adapter branch 2 times, most recently from f6fcd1d to 698dfec Compare July 2, 2025 06:57
@XinWangIntel XinWangIntel marked this pull request as ready for review July 2, 2025 06:57
@XinWangIntel XinWangIntel requested review from a team as code owners July 2, 2025 06:57
@XinWangIntel XinWangIntel force-pushed the serialize-on-plugin-adapter branch 2 times, most recently from a23728c to 0d0fd33 Compare July 10, 2025 01:20
@XinWangIntel XinWangIntel changed the title Serialize IR and use VCL API to do offline compilation Use VCL API to do offline compilation Jul 17, 2025
@XinWangIntel XinWangIntel force-pushed the serialize-on-plugin-adapter branch 3 times, most recently from a38e20e to 9708ada Compare July 21, 2025 09:15
@github-actions
Copy link
Contributor

This PR will be closed in a week because of 2 weeks of no activity.

@github-actions github-actions bot added the Stale label Aug 22, 2025
@github-actions
Copy link
Contributor

This PR was closed because it has been stalled for 2 week with no activity.

@github-actions github-actions bot closed this Aug 30, 2025
@XinWangIntel XinWangIntel reopened this Sep 4, 2025
@github-actions github-actions bot removed the Stale label Sep 5, 2025
driver_compiler_utils::serializeIR(model,
compilerVersion,
maxOpsetVersion,
updatedConfig.isAvailable(ov::intel_npu::use_base_model_serializer.name())
Copy link
Contributor

@razvanapetroaie razvanapetroaie Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_base_model_serializer is set to true by default, which means the older serializer, the one that copies all weights, is used. If we introduce the VCL interface to the CiP path, then we don't want major regressions to happen, so I believe we should set the default for use_base_model_serializer to false only when using the CiP adapter.

One way to do this: create a new function in plugin_compiler_adapter.cpp, maybe something like:

void changeDefaultModelSerializer(FilteredConfig& config) {
    if (config.isAvailable(ov::intel_npu::use_base_model_serializer.name()) && !config.hasOpt(ov::intel_npu::use_base_model_serializer.name())) {
        config.update({{ov::intel_npu::use_base_model_serializer.name(), "NO"}});
    }
}

and call it at the beginning of every method within PluginCompilerAdapter that may need to serialize the model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

& fyi: you'll have to modify how use_base_model_serializer is handled after I merge this PR. I'll give you a notification when that happens and I'll try to help you adjust the code.

@DanLiu2Intel DanLiu2Intel force-pushed the serialize-on-plugin-adapter branch 7 times, most recently from 7b8b902 to 4c0fbb2 Compare November 21, 2025 12:21
@DanLiu2Intel DanLiu2Intel force-pushed the serialize-on-plugin-adapter branch from 4c0fbb2 to b363ff6 Compare November 21, 2025 14:54
vcl_symbol_statement(vclLogHandleGetString)


//unsupported symbols with older ze_loader versions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... compiler versions this time.
We will not support compiler versions that do not have these symbols. I think we can remove the weak symbols list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vcl_weak_symbols_list needs to include vclAllocatedExecutableCreateWSOneShot to compile with Weightless. Other symbols have been used by the repository at https://github.com/openvinotoolkit/npu_compiler/releases, so move them to the vcl_symbols_list above.


_logger.debug("compile end, blob size:%d", allocator.m_vec.size());
return NetworkDescription(std::move(allocator.m_vec), std::move(metadata));
} else if (usedMajor >= 6 && usedMinor >= 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No plan to support older compiler versions. I think we can remove these branches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to support version 7.3 used in https://github.com/openvinotoolkit/npu_compiler/releases, branches with versions lower than 6.1 were ultimately deleted, while versions from 6.1 to 7.4 were retained.

bool is_option_supported(const std::string& option) const;

private:
std::shared_ptr<VCLApi> _vclApi;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually being populated somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CompilerImpl can get and keep a reference to VCL API in its contrustor, maybe this file should be renamed to compiler_impl.hpp? We should also hide VCLApi from the adapter if we only want it to be accessed through the compiler implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The member std::shared_ptr _vclApi; indicates that vclApi is initialized by creating the VCLCompilerImpl object. This member has now been removed, and vclApi::getInstance() is used to initialize and load the VCL library in VCLCompilerImpl constructor.

_compiler = load_compiler(libPath);
_logger.info("Loading PLUGIN compiler");
try {
auto vclCompilerPtr = VCLCompilerImpl::getInstance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order seems wrong here: first get the implementation then the library? Should the compiler implementation get and keep a reference to VCL API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid embedding more VCL-related information (such as symbol table definitions) into the plugin_adapter, it is loaded in the VCLCompilerImpl instead.

The reason for keeping a reference to the VCL API is that when creating the corresponding _compiler, both the object and the reference need to be passed in simultaneously https://github.com/openvinotoolkit/openvino/blob/master/src/inference/dev_api/openvino/runtime/so_ptr.hpp.

_compiler = ov::SoPtr<intel_npu::ICompiler>(ptr, so);
SoPtr(const std::shared_ptr<T>& ptr, const std::shared_ptr<void>& so) : _ptr{ptr}, _so{so} {}

Although SoPtr(const std::shared_ptr<T>& ptr) : _ptr{ptr}, _so{nullptr} {} seems to allow passing nullptr to _so, considering the issue of premature unloading, we still choose to follow the same loading method as MLIR compiler loading.

@DanLiu2Intel DanLiu2Intel force-pushed the serialize-on-plugin-adapter branch from a54dae0 to 7bb58a7 Compare November 24, 2025 11:17
@@ -0,0 +1,130 @@
// Copyright (C) 2018-2025 Intel Corporation
Copy link
Contributor

@razvanapetroaie razvanapetroaie Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright (C) 2018-2025 Intel Corporation
// Copyright (C) 2025 Intel Corporation

I think this is the right format since the file is new.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update in new commit 32fa2f8

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is wrong! It has to be
// Copyright (C) 2018-2025 Intel Corporation


namespace intel_npu {

bool isUseBaseModelSerializer(const FilteredConfig& config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you using this only in compiler_impl.cpp? then maybe it is better to keep it there in an anonymous namespace, and avoid adding this signature in the header needlessly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to anonymous namespace in new commit 32fa2f8

*
* @param model Both source and target.
*/
void storeWeightlessCacheAttribute(const std::shared_ptr<ov::Model>& model);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think you can also move isInitMetadata from both adapters here since you created this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move isInitMetadata to weightless_utils file in new commit 32fa2f8


// user pass model_serializer_version config
if (config.isAvailable(ov::intel_npu::model_serializer_version.name()) &&
config.has(ov::intel_npu::use_base_model_serializer.name())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
config.has(ov::intel_npu::use_base_model_serializer.name())) {
config.has(ov::intel_npu::model_serializer_version.name())) {

think you meant this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, update in new commit 32fa2f8

ov::intel_npu::ModelSerializerVersion::ALL_WEIGHTS_COPY);
}

// vcl serializer method is not set by user, will default to use it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// vcl serializer method is not set by user, will default to use it.
// No VCL serializer was chosen explicitly, will default to the "no weights copy" implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update in new commit 32fa2f8

_logger.debug("create build flags");
buildFlags += driver_compiler_utils::serializeIOInfo(model, true);
buildFlags += " ";
buildFlags += driver_compiler_utils::serializeConfig(config, compilerVersion);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In both compile & compileWSOneShot, the config should be handled the same. Here, the piece of code that registers the correct value for use_base_model_serializer/model_serializer_version is missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update in 32fa2f8

}

std::vector<std::shared_ptr<NetworkDescription>> networkDescrs;
for (uint32_t i = 0; i < allocator.m_vector.size(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a auto& blob : allocator.m_vector type of iterator is more adequate here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update in 32fa2f8

// serializer will be used as the default in the plugin adapter. You need to pass the serializer config;
// otherwise, you will encounter a deserialization issue within the compiler.
_logger.warning("Add serializer config");
if (updatedConfig.isAvailable(ov::intel_npu::use_base_model_serializer.name())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this kind of stuff should also be moved in a separate function, since there's at least three of them that need it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract this part as a public function and place it in an anonymous namespace in 32fa2f8

mainNetworkDescription = initMainNetworkDescriptions.back();
initMainNetworkDescriptions.pop_back();
OPENVINO_ASSERT(initMainNetworkDescriptions.size() > 0,
"The initMainNetworkDescriptions after getting mainNetworkDescription must not be empty!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The initMainNetworkDescriptions after getting mainNetworkDescription must not be empty!");
"No init schedules have been returned by the compiler");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update in 32fa2f8

if (model) {
mainNetworkMetadata.name = model.value()->get_friendly_name();
} else {
_logger.warning("networkMeta name is empty in parse!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"warning" sounds like too much. Only on the weights separation path we expect a model to be provided here. I'd say "info" is more adequate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update in 32fa2f8


std::vector<std::shared_ptr<NetworkDescription>> initNetworkDescriptions;
std::shared_ptr<NetworkDescription> mainNetworkDescription;
storeWeightlessCacheAttribute(model);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, this call is required only if the model needs to be serialized/deserialized. So, only if we use the VCL interface. After this PR is merged, will we always be using the VCL interface, or can we still fallback to the old flow (I admit I didn't follow the library loading part of the code)? If the latter, then maybe we should move this call to the code dedicated to the VCL interface. All weights separation functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, if the VCL interface is not used, then this call won't break anything, but it will waste some resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move storeWeightlessCacheAttribute to compiler_impl.cpp.
In the short term, it is possible to revert to the original MLIR compiler. The current compiler loading mechanism prioritizes linking to the openvino_intel_npu_compiler (VCL compiler); if the openvino_intel_npu_compiler library file does not exist, it will attempt to load the MLIR compiler. If neither is present, an exception will be thrown directly.

_logger.debug("Graph initialize start");

if (_zeGraphExt == nullptr || _graphDesc._handle == nullptr) {
if (!config.get<CREATE_EXECUTOR>() || config.get<DEFER_WEIGHTS_LOAD>()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this extra if here.

_logger.info("Failed to use the level zero graph handle: %s. Inference requests for this model are not "
"allowed. Only exports are available",
ex.what());
} catch (...) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this extra catch here?

try {
initGraphDesc = _zeGraphExt->getGraphDescriptor(tensor.data(), tensor.get_byte_size());
initNetworkMeta = _zeGraphExt->getNetworkMeta(initGraphDesc);
} catch (...) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please align all these catch functions? Btw, what about the name of the model, do we need it here as well?

}

void WeightlessGraph::initialize(const Config& config) {
if (!_zeroInitStruct) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: NPU OpenVINO NPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants