[GPU] Allow host buffer access for Xe2+ iGPUs (#32912)

mklimenk · web-flow · commit a930596d9801 · 2025-11-19T13:54:41.000Z
### Description of the issue Integrated GPUs starting from Xe2 can benefit from reusing the host-sided buffer for the weights. This allows to avoid the allocation of the device-sided buffer in the same physical memory with significant memory footprint reduction and no runtime penalty. Previously it was enabled only for LNL (#31600), but for AI weights that don't benefit from compression there's no need to limit this functionality only to that platform. #### Reproduction step and snapshot - $ benchmark_app -d GPU -hint latency -nireq 1 -t 30 -b 1 -m <model> -ip f32 -op f32 Check for the "Compile model ram used" metric. For an fp16 stable diffusion model with size of 600MB, there is a ~600MB on multiple platforms, more details in the ticket. #### Checklist - [x] Is it a proper fix? (not a workaround) - [ ] Did you include test case for this fix, if necessary? - [ ] Did you review existing test that can be extended to cover this scenario? Which test did you review? ### Tickets: - [CVS-176845](https://jira.devtools.intel.com/browse/CVS-176845)
diff --git a/src/plugins/intel_gpu/src/graph/network.cpp b/src/plugins/intel_gpu/src/graph/network.cpp
@@ -1004,8 +1004,8 @@ void network::transfer_memory_to_device(std::shared_ptr<primitive_inst> instance
         return;
 
     if (alloc_type == allocation_type::usm_host || alloc_type == allocation_type::usm_shared) {
-        // usm_device memory does not provide performance benefits on the LNL platform
-        if (get_engine().get_device_info().arch == gpu_arch::xe2 &&
+        // usm_device memory does not provide performance benefits on the integrated Xe2+ platforms
+        if (get_engine().get_device_info().arch >= gpu_arch::xe2 &&
             get_engine().get_device_info().dev_type == device_type::integrated_gpu) {
             return;
         }
diff --git a/src/plugins/intel_gpu/src/graph/program.cpp b/src/plugins/intel_gpu/src/graph/program.cpp
@@ -691,8 +691,8 @@ void program::transfer_memory_to_device() {
             }
 
             if (alloc_type == allocation_type::usm_host || alloc_type == allocation_type::usm_shared) {
-                // usm_device memory does not provide performance benefits on the LNL platform
-                if (get_engine().get_device_info().arch == gpu_arch::xe2 &&
+                // usm_device memory does not provide performance benefits on the integrated Xe2+ platforms
+                if (get_engine().get_device_info().arch >= gpu_arch::xe2 &&
                     get_engine().get_device_info().dev_type == device_type::integrated_gpu) {
                     return;
                 }

Original file line number	Diff line number	Diff line change
`@@ -691,8 +691,8 @@ void program::transfer_memory_to_device() {`
`691`	`691`	`}`
`692`	`692`
`693`	`693`	`if (alloc_type == allocation_type::usm_host \|\| alloc_type == allocation_type::usm_shared) {`
`694`		`- // usm_device memory does not provide performance benefits on the LNL platform`
`695`		`- if (get_engine().get_device_info().arch == gpu_arch::xe2 &&`
	`694`	`+ // usm_device memory does not provide performance benefits on the integrated Xe2+ platforms`
	`695`	`+ if (get_engine().get_device_info().arch >= gpu_arch::xe2 &&`
`696`	`696`	`get_engine().get_device_info().dev_type == device_type::integrated_gpu) {`
`697`	`697`	`return;`
`698`	`698`	`}`