Skip to content

Commit a930596

Browse files
authored
[GPU] Allow host buffer access for Xe2+ iGPUs (#32912)
### Description of the issue Integrated GPUs starting from Xe2 can benefit from reusing the host-sided buffer for the weights. This allows to avoid the allocation of the device-sided buffer in the same physical memory with significant memory footprint reduction and no runtime penalty. Previously it was enabled only for LNL (#31600), but for AI weights that don't benefit from compression there's no need to limit this functionality only to that platform. #### Reproduction step and snapshot - $ benchmark_app -d GPU -hint latency -nireq 1 -t 30 -b 1 -m <model> -ip f32 -op f32 Check for the "Compile model ram used" metric. For an fp16 stable diffusion model with size of 600MB, there is a ~600MB on multiple platforms, more details in the ticket. #### Checklist - [x] Is it a proper fix? (not a workaround) - [ ] Did you include test case for this fix, if necessary? - [ ] Did you review existing test that can be extended to cover this scenario? Which test did you review? ### Tickets: - [CVS-176845](https://jira.devtools.intel.com/browse/CVS-176845)
1 parent a0c5fa3 commit a930596

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

src/plugins/intel_gpu/src/graph/network.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1004,8 +1004,8 @@ void network::transfer_memory_to_device(std::shared_ptr<primitive_inst> instance
10041004
return;
10051005

10061006
if (alloc_type == allocation_type::usm_host || alloc_type == allocation_type::usm_shared) {
1007-
// usm_device memory does not provide performance benefits on the LNL platform
1008-
if (get_engine().get_device_info().arch == gpu_arch::xe2 &&
1007+
// usm_device memory does not provide performance benefits on the integrated Xe2+ platforms
1008+
if (get_engine().get_device_info().arch >= gpu_arch::xe2 &&
10091009
get_engine().get_device_info().dev_type == device_type::integrated_gpu) {
10101010
return;
10111011
}

src/plugins/intel_gpu/src/graph/program.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -691,8 +691,8 @@ void program::transfer_memory_to_device() {
691691
}
692692

693693
if (alloc_type == allocation_type::usm_host || alloc_type == allocation_type::usm_shared) {
694-
// usm_device memory does not provide performance benefits on the LNL platform
695-
if (get_engine().get_device_info().arch == gpu_arch::xe2 &&
694+
// usm_device memory does not provide performance benefits on the integrated Xe2+ platforms
695+
if (get_engine().get_device_info().arch >= gpu_arch::xe2 &&
696696
get_engine().get_device_info().dev_type == device_type::integrated_gpu) {
697697
return;
698698
}

0 commit comments

Comments
 (0)