Use VCL API to do offline compilation #30732

XinWangIntel · 2025-05-27T06:25:22Z

Details:

When compiler type is MLIR, it wilI try to link the VCL library (openvino_intel_npu_compiler.dll or libopenvino_intel_npu_compiler.so) first. if the vcl library does not exist, it will check for the MLIR library. If the MLIR library is linked, it will be used for compilation. If neither exists, using the MLIR as compilerType will fail and throw exception.
- Added npu_driver_compiler.h with detailed type definitions for VCL handles, enums, and structs, such as vcl_compiler_handle_t, vcl_result_t, and vcl_device_desc_t, to support the VCL API.
- Added a new vcl_api.hpp file to define and wrap VCL API functions, enabling dynamic symbol resolution and providing a singleton-based interface for accessing the API.
- Introduced VCLCompilerImpl class to implement compiler-related operations using the VCL API, including methods for compilation, querying, and profiling.(update platfrom and device by pass device sting from plugin)

Tickets:

CVS-174281

github-actions · 2025-08-22T00:26:29Z

This PR will be closed in a week because of 2 weeks of no activity.

github-actions · 2025-08-30T00:24:53Z

This PR was closed because it has been stalled for 2 week with no activity.

razvanapetroaie · 2025-11-20T14:48:47Z

src/plugins/intel_npu/src/compiler_adapter/src/vcl_api.cpp

+        driver_compiler_utils::serializeIR(model,
+                                           compilerVersion,
+                                           maxOpsetVersion,
+                                           updatedConfig.isAvailable(ov::intel_npu::use_base_model_serializer.name())


use_base_model_serializer is set to true by default, which means the older serializer, the one that copies all weights, is used. If we introduce the VCL interface to the CiP path, then we don't want major regressions to happen, so I believe we should set the default for use_base_model_serializer to false only when using the CiP adapter.

One way to do this: create a new function in plugin_compiler_adapter.cpp, maybe something like:

void changeDefaultModelSerializer(FilteredConfig& config) { if (config.isAvailable(ov::intel_npu::use_base_model_serializer.name()) && !config.hasOpt(ov::intel_npu::use_base_model_serializer.name())) { config.update({{ov::intel_npu::use_base_model_serializer.name(), "NO"}}); } }

and call it at the beginning of every method within PluginCompilerAdapter that may need to serialize the model.

& fyi: you'll have to modify how use_base_model_serializer is handled after I merge this PR. I'll give you a notification when that happens and I'll try to help you adjust the code.

Signed-off-by: Kang, Wenjing <[email protected]>

PatrikStepan · 2025-11-23T06:54:59Z

src/plugins/intel_npu/src/compiler_adapter/include/vcl_api.hpp

+    vcl_symbol_statement(vclLogHandleGetString)
+
+
+//unsupported symbols with older ze_loader versions


... compiler versions this time.
We will not support compiler versions that do not have these symbols. I think we can remove the weak symbols list.

The vcl_weak_symbols_list needs to include vclAllocatedExecutableCreateWSOneShot to compile with Weightless. Other symbols have been used by the repository at https://github.com/openvinotoolkit/npu_compiler/releases, so move them to the vcl_symbols_list above.

PatrikStepan · 2025-11-23T09:13:45Z

src/plugins/intel_npu/src/compiler_adapter/src/compiler_impl.cpp

+
+        _logger.debug("compile end, blob size:%d", allocator.m_vec.size());
+        return NetworkDescription(std::move(allocator.m_vec), std::move(metadata));
+    } else if (usedMajor >= 6 && usedMinor >= 1) {


No plan to support older compiler versions. I think we can remove these branches.

In order to support version 7.3 used in https://github.com/openvinotoolkit/npu_compiler/releases, branches with versions lower than 6.1 were ultimately deleted, while versions from 6.1 to 7.4 were retained.

PatrikStepan · 2025-11-23T09:32:03Z

src/plugins/intel_npu/src/compiler_adapter/include/vcl_api.hpp

+    bool is_option_supported(const std::string& option) const;
+
+private:
+    std::shared_ptr<VCLApi> _vclApi;


Is this actually being populated somewhere?

If CompilerImpl can get and keep a reference to VCL API in its contrustor, maybe this file should be renamed to compiler_impl.hpp? We should also hide VCLApi from the adapter if we only want it to be accessed through the compiler implementation.

The member std::shared_ptr _vclApi; indicates that vclApi is initialized by creating the VCLCompilerImpl object. This member has now been removed, and vclApi::getInstance() is used to initialize and load the VCL library in VCLCompilerImpl constructor.

PatrikStepan · 2025-11-23T09:34:31Z

src/plugins/intel_npu/src/compiler_adapter/src/plugin_compiler_adapter.cpp

-    _compiler = load_compiler(libPath);
+    _logger.info("Loading PLUGIN compiler");
+    try {
+        auto vclCompilerPtr = VCLCompilerImpl::getInstance();


The order seems wrong here: first get the implementation then the library? Should the compiler implementation get and keep a reference to VCL API?

To avoid embedding more VCL-related information (such as symbol table definitions) into the plugin_adapter, it is loaded in the VCLCompilerImpl instead.

The reason for keeping a reference to the VCL API is that when creating the corresponding _compiler, both the object and the reference need to be passed in simultaneously https://github.com/openvinotoolkit/openvino/blob/master/src/inference/dev_api/openvino/runtime/so_ptr.hpp.

_compiler = ov::SoPtr<intel_npu::ICompiler>(ptr, so); SoPtr(const std::shared_ptr<T>& ptr, const std::shared_ptr<void>& so) : _ptr{ptr}, _so{so} {}

Although SoPtr(const std::shared_ptr<T>& ptr) : _ptr{ptr}, _so{nullptr} {} seems to allow passing nullptr to _so, considering the issue of premature unloading, we still choose to follow the same loading method as MLIR compiler loading.

src/plugins/intel_npu/src/compiler_adapter/include/compiler.h

src/plugins/intel_npu/src/common/include/intel_npu/common/icompiler_adapter.hpp

src/plugins/intel_npu/src/compiler_adapter/include/plugin_compiler_adapter.hpp

src/plugins/intel_npu/src/plugin/src/plugin.cpp

…MainNetworkDescriptions size Signed-off-by: Kang, Wenjing <[email protected]>

src/plugins/intel_npu/src/compiler_adapter/include/compiler.h

src/plugins/intel_npu/src/compiler_adapter/src/plugin_compiler_adapter.cpp

src/plugins/intel_npu/src/compiler_adapter/src/graph.cpp

src/plugins/intel_npu/src/compiler_adapter/include/compiler_impl.hpp

razvanapetroaie · 2025-11-24T13:49:11Z

src/plugins/intel_npu/src/compiler_adapter/include/vcl_api.hpp

@@ -0,0 +1,130 @@
+// Copyright (C) 2018-2025 Intel Corporation


Suggested change

// Copyright (C) 2018-2025 Intel Corporation

// Copyright (C) 2025 Intel Corporation

I think this is the right format since the file is new.

update in new commit 32fa2f8

No, this is wrong! It has to be
// Copyright (C) 2018-2025 Intel Corporation

razvanapetroaie · 2025-11-24T14:02:02Z

src/plugins/intel_npu/src/compiler_adapter/include/compiler_impl.hpp

+
+namespace intel_npu {
+
+bool isUseBaseModelSerializer(const FilteredConfig& config);


are you using this only in compiler_impl.cpp? then maybe it is better to keep it there in an anonymous namespace, and avoid adding this signature in the header needlessly.

move to anonymous namespace in new commit 32fa2f8

razvanapetroaie · 2025-11-24T14:14:32Z

src/plugins/intel_npu/src/compiler_adapter/include/weightless_utils.hpp

+ *
+ * @param model Both source and target.
+ */
+void storeWeightlessCacheAttribute(const std::shared_ptr<ov::Model>& model);


think you can also move isInitMetadata from both adapters here since you created this file.

move isInitMetadata to weightless_utils file in new commit 32fa2f8

razvanapetroaie · 2025-11-24T14:17:14Z

src/plugins/intel_npu/src/compiler_adapter/src/compiler_impl.cpp

+
+    // user pass model_serializer_version config
+    if (config.isAvailable(ov::intel_npu::model_serializer_version.name()) &&
+        config.has(ov::intel_npu::use_base_model_serializer.name())) {


Suggested change

config.has(ov::intel_npu::use_base_model_serializer.name())) {

config.has(ov::intel_npu::model_serializer_version.name())) {

think you meant this

Yes, update in new commit 32fa2f8

razvanapetroaie · 2025-11-24T14:22:20Z

src/plugins/intel_npu/src/compiler_adapter/src/compiler_impl.cpp

+                ov::intel_npu::ModelSerializerVersion::ALL_WEIGHTS_COPY);
+    }
+
+    // vcl serializer method is not set by user, will default to use it.


Suggested change

// vcl serializer method is not set by user, will default to use it.

// No VCL serializer was chosen explicitly, will default to the "no weights copy" implementation

update in new commit 32fa2f8

razvanapetroaie · 2025-11-24T14:44:58Z

src/plugins/intel_npu/src/compiler_adapter/src/compiler_impl.cpp

+    _logger.debug("create build flags");
+    buildFlags += driver_compiler_utils::serializeIOInfo(model, true);
+    buildFlags += " ";
+    buildFlags += driver_compiler_utils::serializeConfig(config, compilerVersion);


In both compile & compileWSOneShot, the config should be handled the same. Here, the piece of code that registers the correct value for use_base_model_serializer/model_serializer_version is missing.

update in 32fa2f8

razvanapetroaie · 2025-11-24T14:49:08Z

src/plugins/intel_npu/src/compiler_adapter/src/compiler_impl.cpp

+    }
+
+    std::vector<std::shared_ptr<NetworkDescription>> networkDescrs;
+    for (uint32_t i = 0; i < allocator.m_vector.size(); i++) {


I think a auto& blob : allocator.m_vector type of iterator is more adequate here.

update in 32fa2f8

razvanapetroaie · 2025-11-24T14:50:37Z

src/plugins/intel_npu/src/compiler_adapter/src/compiler_impl.cpp

+        // serializer will be used as the default in the plugin adapter. You need to pass the serializer config;
+        // otherwise, you will encounter a deserialization issue within the compiler.
+        _logger.warning("Add serializer config");
+        if (updatedConfig.isAvailable(ov::intel_npu::use_base_model_serializer.name())) {


maybe this kind of stuff should also be moved in a separate function, since there's at least three of them that need it.

Extract this part as a public function and place it in an anonymous namespace in 32fa2f8

razvanapetroaie · 2025-11-24T14:55:14Z

src/plugins/intel_npu/src/compiler_adapter/src/plugin_compiler_adapter.cpp

        mainNetworkDescription = initMainNetworkDescriptions.back();
        initMainNetworkDescriptions.pop_back();
+        OPENVINO_ASSERT(initMainNetworkDescriptions.size() > 0,
+                        "The initMainNetworkDescriptions after getting mainNetworkDescription must not be empty!");


Suggested change

"The initMainNetworkDescriptions after getting mainNetworkDescription must not be empty!");

"No init schedules have been returned by the compiler");

update in 32fa2f8

razvanapetroaie · 2025-11-24T14:58:58Z

src/plugins/intel_npu/src/compiler_adapter/src/plugin_compiler_adapter.cpp

+        if (model) {
+            mainNetworkMetadata.name = model.value()->get_friendly_name();
+        } else {
+            _logger.warning("networkMeta name is empty in parse!");


"warning" sounds like too much. Only on the weights separation path we expect a model to be provided here. I'd say "info" is more adequate.

update in 32fa2f8

razvanapetroaie · 2025-11-24T15:19:41Z

src/plugins/intel_npu/src/compiler_adapter/src/plugin_compiler_adapter.cpp


-    std::vector<std::shared_ptr<NetworkDescription>> initNetworkDescriptions;
-    std::shared_ptr<NetworkDescription> mainNetworkDescription;
+    storeWeightlessCacheAttribute(model);


btw, this call is required only if the model needs to be serialized/deserialized. So, only if we use the VCL interface. After this PR is merged, will we always be using the VCL interface, or can we still fallback to the old flow (I admit I didn't follow the library loading part of the code)? If the latter, then maybe we should move this call to the code dedicated to the VCL interface. All weights separation functions.

Just to clarify, if the VCL interface is not used, then this call won't break anything, but it will waste some resources.

move storeWeightlessCacheAttribute to compiler_impl.cpp.
In the short term, it is possible to revert to the original MLIR compiler. The current compiler loading mechanism prioritizes linking to the openvino_intel_npu_compiler (VCL compiler); if the openvino_intel_npu_compiler library file does not exist, it will attempt to load the MLIR compiler. If neither is present, an exception will be thrown directly.

pereanub · 2025-11-25T10:07:52Z

src/plugins/intel_npu/src/compiler_adapter/src/graph.cpp

    _logger.debug("Graph initialize start");

    if (_zeGraphExt == nullptr || _graphDesc._handle == nullptr) {
+        if (!config.get<CREATE_EXECUTOR>() || config.get<DEFER_WEIGHTS_LOAD>()) {


I don't think we need this extra if here.

pereanub · 2025-11-25T10:08:55Z

src/plugins/intel_npu/src/compiler_adapter/src/plugin_compiler_adapter.cpp

+            _logger.info("Failed to use the level zero graph handle: %s. Inference requests for this model are not "
+                         "allowed. Only exports are available",
+                         ex.what());
        } catch (...) {


Do we still need this extra catch here?

pereanub · 2025-11-25T10:10:02Z

src/plugins/intel_npu/src/compiler_adapter/src/plugin_compiler_adapter.cpp

+                try {
+                    initGraphDesc = _zeGraphExt->getGraphDescriptor(tensor.data(), tensor.get_byte_size());
+                    initNetworkMeta = _zeGraphExt->getNetworkMeta(initGraphDesc);
+                } catch (...) {


can you please align all these catch functions? Btw, what about the name of the model, do we need it here as well?

pereanub · 2025-11-25T10:10:32Z

src/plugins/intel_npu/src/compiler_adapter/src/weightless_graph.cpp

 }

 void WeightlessGraph::initialize(const Config& config) {
+    if (!_zeroInitStruct) {


is this needed?

github-actions bot added the category: NPU OpenVINO NPU plugin label May 27, 2025

XinWangIntel force-pushed the serialize-on-plugin-adapter branch from 973bdc5 to 9979810 Compare May 30, 2025 07:17

github-actions bot added the category: build OpenVINO cmake script / infra label Jun 9, 2025

XinWangIntel force-pushed the serialize-on-plugin-adapter branch 9 times, most recently from 79a430e to 466cbb9 Compare June 12, 2025 06:03

XinWangIntel force-pushed the serialize-on-plugin-adapter branch 2 times, most recently from f6fcd1d to 698dfec Compare July 2, 2025 06:57

XinWangIntel marked this pull request as ready for review July 2, 2025 06:57

XinWangIntel requested review from a team as code owners July 2, 2025 06:57

XinWangIntel force-pushed the serialize-on-plugin-adapter branch 2 times, most recently from a23728c to 0d0fd33 Compare July 10, 2025 01:20

XinWangIntel changed the title ~~Serialize IR and use VCL API to do offline compilation~~ Use VCL API to do offline compilation Jul 17, 2025

XinWangIntel force-pushed the serialize-on-plugin-adapter branch 3 times, most recently from a38e20e to 9708ada Compare July 21, 2025 09:15

github-actions bot added the Stale label Aug 22, 2025

github-actions bot closed this Aug 30, 2025

XinWangIntel reopened this Sep 4, 2025

github-actions bot removed the Stale label Sep 5, 2025

razvanapetroaie reviewed Nov 20, 2025

View reviewed changes

Merge branch 'master' into serialize-on-plugin-adapter

5ab6d0d

DanLiu2Intel force-pushed the serialize-on-plugin-adapter branch 7 times, most recently from 7b8b902 to 4c0fbb2 Compare November 21, 2025 12:21

DanLiu2Intel and others added 2 commits November 21, 2025 22:51

update to use vcl serializer

0c9ae28

Add compileWsOneShot and compileWsIterative for VCLCompilerImpl

b363ff6

Signed-off-by: Kang, Wenjing <[email protected]>

DanLiu2Intel force-pushed the serialize-on-plugin-adapter branch from 4c0fbb2 to b363ff6 Compare November 21, 2025 14:54

clang-format

16d591c

PatrikStepan reviewed Nov 23, 2025

View reviewed changes

src/plugins/intel_npu/src/compiler_adapter/include/compiler.h Outdated Show resolved Hide resolved

PatrikStepan reviewed Nov 23, 2025

View reviewed changes

src/plugins/intel_npu/src/common/include/intel_npu/common/icompiler_adapter.hpp Outdated Show resolved Hide resolved

PatrikStepan reviewed Nov 23, 2025

View reviewed changes

src/plugins/intel_npu/src/compiler_adapter/include/plugin_compiler_adapter.hpp Outdated Show resolved Hide resolved

PatrikStepan reviewed Nov 23, 2025

View reviewed changes

src/plugins/intel_npu/src/plugin/src/plugin.cpp Outdated Show resolved Hide resolved

DanLiu2Intel and others added 2 commits November 24, 2025 16:26

fix comments

5ba9f39

Add OPENVINO_ASSERT for one shot weightless compilation to check init…

24e2935

…MainNetworkDescriptions size Signed-off-by: Kang, Wenjing <[email protected]>

pereanub reviewed Nov 24, 2025

View reviewed changes

fix comments2

7bb58a7

DanLiu2Intel force-pushed the serialize-on-plugin-adapter branch from a54dae0 to 7bb58a7 Compare November 24, 2025 11:17

razvanapetroaie reviewed Nov 24, 2025

View reviewed changes

fix comments3 and formt

32fa2f8

pereanub reviewed Nov 25, 2025

View reviewed changes

		vcl_symbol_statement(vclLogHandleGetString)


		//unsupported symbols with older ze_loader versions

		@@ -0,0 +1,130 @@
		// Copyright (C) 2018-2025 Intel Corporation

	// Copyright (C) 2018-2025 Intel Corporation
	// Copyright (C) 2025 Intel Corporation


		namespace intel_npu {

		bool isUseBaseModelSerializer(const FilteredConfig& config);

	config.has(ov::intel_npu::use_base_model_serializer.name())) {
	config.has(ov::intel_npu::model_serializer_version.name())) {

	// vcl serializer method is not set by user, will default to use it.
	// No VCL serializer was chosen explicitly, will default to the "no weights copy" implementation

	"The initMainNetworkDescriptions after getting mainNetworkDescription must not be empty!");
	"No init schedules have been returned by the compiler");

Use VCL API to do offline compilation #30732

Are you sure you want to change the base?

Use VCL API to do offline compilation #30732

Conversation

XinWangIntel commented May 27, 2025 • edited by DanLiu2Intel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Tickets:

Uh oh!

github-actions bot commented Aug 22, 2025

Uh oh!

github-actions bot commented Aug 30, 2025

Uh oh!

razvanapetroaie Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

razvanapetroaie Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

XinWangIntel commented May 27, 2025 •

edited by DanLiu2Intel

Loading

razvanapetroaie Nov 20, 2025 •

edited

Loading

razvanapetroaie Nov 24, 2025 •

edited

Loading