Skip to content

SYCL runtime: Severe host overhead in sycl::get_kernel_bundle #15824

Open
@majing921201

Description

@majing921201

For platform compatible, we didn't use device max work group size to launch kernel, and switch to query specific max work group size for kernel by SYCL API. following is our code example

  auto kid = ::sycl::get_kernel_id<KernelClass>();
  auto kbundle = ::sycl::get_kernel_bundle<::sycl::bundle_state::executable>(
      ctx, {dev}, {kid});
  ::sycl::kernel k = kbundle.get_kernel(kid);
  int max_work_group_size =  k.get_info<::sycl::info::kernel_device_specific::work_group_size>(dev); 

We found this usage takes much host overhead in application. we measured one kernel CPU performance here, each API name in table maps example code:

<style> </style>
API get_kernel_id get_kernel_bundle get_kernel get_info
time (us) 0.434 42.481 4.241 1.125

We also file internal jira to track this issue. Can you help evaluate this slow performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions