Skip to content

Conversation

@mklimenk
Copy link
Contributor

Description of the issue

Integrated GPUs starting from Xe2 can benefit from reusing the host-sided buffer for the weights. This allows to avoid the allocation of the device-sided buffer in the same physical memory with significant memory footprint reduction and no runtime penalty. Previously it was enabled only for LNL (#31600), but for AI weights that don't benefit from compression there's no need to limit this functionality only to that platform.

Reproduction step and snapshot

  • $ benchmark_app -d GPU -hint latency -nireq 1 -t 30 -b 1 -m -ip f32 -op f32
    Check for the "Compile model ram used" metric. For an fp16 stable diffusion model with size of 600MB, there is a ~600MB on multiple platforms, more details in the ticket.

Checklist

  • Is it a proper fix? (not a workaround)
  • Did you include test case for this fix, if necessary?
  • Did you review existing test that can be extended to cover this scenario? Which test did you review?

Tickets:

@mklimenk mklimenk requested review from a team as code owners November 18, 2025 15:27
@github-actions github-actions bot added the category: GPU OpenVINO GPU plugin label Nov 18, 2025
@sys-openvino-ci sys-openvino-ci added the ExternalIntelPR External contributor from Intel label Nov 18, 2025
@mklimenk mklimenk changed the title Allow host buffer access for Xe2+ iGPUs [GPU] Allow host buffer access for Xe2+ iGPUs Nov 18, 2025
@p-durandin
Copy link
Contributor

build_jenkins

Copy link
Contributor

@Lyamin-Roman Lyamin-Roman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have there been any performance tests? It's necessary to verify that this really doesn't cause any performance drops

@mklimenk
Copy link
Contributor Author

@Lyamin-Roman, yes, I've checked performance on a set of models, as well as some synthetic tests, such as a model consisting of a single GEMM with different dimensions. Every tests demonstrated the same performance (±1%)

@isanghao
Copy link
Contributor

Did you check with driver team for this change? Actually this is different from what we heard from driver team previously..

Memory footprint reduction is something unexpected and I guess there is some issue for memory footprint, which is hided by this change.

@p-durandin p-durandin added this pull request to the merge queue Nov 19, 2025
if (alloc_type == allocation_type::usm_host || alloc_type == allocation_type::usm_shared) {
// usm_device memory does not provide performance benefits on the LNL platform
if (get_engine().get_device_info().arch == gpu_arch::xe2 &&
// usm_device memory does not provide performance benefits on the integrated Xe2+ platforms

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On PTL integrated parts we have e2e compression available for device USM allocations.
It means that if data is nicely compressible, you may see compression benefits from using device USM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but at the same time the trained weights aren't typically compressible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes then this would help to reduce the memory culprit and reduce amount of copies

Merged via the queue into openvinotoolkit:master with commit a930596 Nov 19, 2025
212 of 216 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin ExternalIntelPR External contributor from Intel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants