Skip to content

[NFCI][SYCL] Remove boost::mp11 dependency from invoke_simd.hpp #15713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 16, 2024

Conversation

aelovikov-intel
Copy link
Contributor

C++17 implementation is about as readable as boost-based one, and we want to drop the external boost dependency for upstreaming purposes.

C++17 implementation is about as readable as boost-based one, and we
want to drop the external boost dependency for upstreaming purposes.
Copy link
Contributor

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow cool, thanks

@bader
Copy link
Contributor

bader commented Oct 15, 2024

@aelovikov-intel, please, remove fetching boost::mp11 from cmake files.

@aelovikov-intel
Copy link
Contributor Author

@aelovikov-intel, please, remove fetching boost::mp11 from cmake files.

I haven't eliminated all the uses yet.

@bader
Copy link
Contributor

bader commented Oct 15, 2024

C++17 implementation is about as readable as boost-based one, and we want to drop the external boost dependency for upstreaming purposes.

The main reason to add boost::m11 is the compilation speed. If you won't be able to remove all uses of boost::m11, I'm not sure if it's worth removing it from invoke_simd.hpp.

@aelovikov-intel
Copy link
Contributor Author

C++17 implementation is about as readable as boost-based one, and we want to drop the external boost dependency for upstreaming purposes.

The main reason to add boost::m11 is the compilation speed. If you won't be able to remove all uses of boost::m11, I'm not sure if it's worth removing it from invoke_simd.hpp.

IIRC, that was true for sorting properties and then recently for typelist machinery in @rolandschulz 's PR. ESIMD use-case was added during C++14 timeframe and was done for readability mostly.

I've already prototyped complete removal of typelists usages, and the team has agreed that properties speedup is the sacrifice we're willing to make for the ease of upstreaming.

@bader
Copy link
Contributor

bader commented Oct 15, 2024

@AlexeySachkov and I are working on reducing the compilation time for SYCL sources. We would like to avoid patches making compilation slower.

I've already prototyped complete removal of typelists usages, and the team has agreed that properties speedup is the sacrifice we're willing to make for the ease of upstreaming.

Did you measure the compile time impact? We need to have some data to make the right decision.

@aelovikov-intel
Copy link
Contributor Author

aelovikov-intel commented Oct 15, 2024

@AlexeySachkov and I are working on reducing the compilation time for SYCL sources. We would like to avoid patches making compilation slower.

I've already prototyped complete removal of typelists usages, and the team has agreed that properties speedup is the sacrifice we're willing to make for the ease of upstreaming.

Did you measure the compile time impact? We need to have some data to make the right decision.

I just asked @uditagarwal97 , we didn't make any measurements for #8168, so that was driven by readability mostly.

As for my changes, I'm looking at $ time clang++ -fsycl -include 'sycl/sycl.hpp' -x c++ /dev/null -c -o /dev/null that stays stable.

I also think that whenever performance is an issue, the test has to be added as part of the optimization. Without such test, readability/maintainability trumps any speculated performance of an "optimization".

@bader
Copy link
Contributor

bader commented Oct 15, 2024

As for my changes, I'm looking at $ time clang++ -fsycl -include 'sycl/sycl.hpp' -x c++ /dev/null -c -o /dev/null that stays stable.

I suppose this way of testing doesn't instantiate property templates. I would check the impact on SYCL-CTS compile time.

I also think that whenever performance is an issue, the test has to be added as part of the optimization. Without such test, readability/maintainability trumps any speculated performance of an "optimization".

Agree.

return std::pair{num_found, found_sg_size};
}();

constexpr auto num_found = x.first;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: slightly less verbose with structured binding.

Copy link
Contributor Author

@aelovikov-intel aelovikov-intel Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not allowed in constexpr context, sadly. I tried constexpr auto [...] = []() { .... }();.

@aelovikov-intel
Copy link
Contributor Author

I suppose this way of testing doesn't instantiate property templates. I would check the impact on SYCL-CTS compile time.

I thought the most problematic case for us is multiple small TUs where sycl.hpp processing "eats" most of the compile time. IMO, my command reflects that scenario good enough. Do I miss anything?

@bader
Copy link
Contributor

bader commented Oct 15, 2024

I suppose this way of testing doesn't instantiate property templates. I would check the impact on SYCL-CTS compile time.

I thought the most problematic case for us is multiple small TUs where sycl.hpp processing "eats" most of the compile time. IMO, my command reflects that scenario good enough. Do I miss anything?

If I get it right, we must instantiate template classes/methods using compile time properties to see the compilation time difference between boost::m11 and hand-written templates. W/o instantiating we measure only parsing time, which is also significant, but it won't expose the compile time benefits of boost::m11. I suppose the parse time for boost::m11 is even worse. @intel/dpcpp-cfe-reviewers can correct me if I wrong.

We see template instantiation takes significant time for the code we use internally, but I don't know if compile time properties are involved. They should be covered by SYCL-CTS.

@rolandschulz
Copy link
Contributor

rolandschulz commented Oct 15, 2024

I thought the most problematic case for us is multiple small TUs where sycl.hpp processing "eats" most of the compile time. IMO, my command reflects that scenario good enough. Do I miss anything?

We expect properties to be used more in the future, because we expect more extensions to depend on it and expect properties to become part of the core spec.

The current sorting isn't very efficient. q-sort is slow for (many) small lists (the common case). Merge sort (or insertion sort for small lists) is faster.

But it isn't just sorting which has an impact on compile speed. get_property and has_property is called a lot more in typical properties code. The current implementation iterates over the type list for both which is slower for larger number of properties. And the iteration isn't needed if one uses the fact that set/map lookup is fast by treating them as inheritance questions (e.g. is_base_of) which exists as efficient compiler primitive. Of course one doesn't need MP11 to have an efficient compile time set and map data structure. All the other meta programming libraries have them too (metal, brigand, kvasir, meta, ...). Are any of those others more appropriative for upstreaming? One of course can also implement our own set and map. It isn't much code. But why would that be better than just taking a small subset of mp11?

The version which uses set/map is: #13776

@aelovikov-intel
Copy link
Contributor Author

@bader , I think the discussion about properties is unrelated to this PR, and removing boost dependency from invoke_simd.hpp is less debatable. Any objections to me merging this?

@bader
Copy link
Contributor

bader commented Oct 16, 2024

@bader , I think the discussion about properties is unrelated to this PR, and removing boost dependency from invoke_simd.hpp is less debatable. Any objections to me merging this?

I think they are related. The purpose of this change to remove dependency on boost m11 eventually. If we unable to remove this dependency for the rest of the project, there is not much value in this change.
If we say that this change improves readability and reduce maintenance cost, but do not impact compilation speed or have other cons, I don't have strong objections. Let the code owner @sarnex to make the final decision.

@sarnex
Copy link
Contributor

sarnex commented Oct 16, 2024

Assuming the plan is to remove mp11 as a dep and that this doesn't have a significant effect on compilation time, it's fine with me.

@aelovikov-intel
Copy link
Contributor Author

If we say that this change improves readability and reduce maintenance cost, but do not impact compilation speed or have other cons, I don't have strong objections. Let the code owner @sarnex to make the final decision.

I think the former is true, and as for the "compilation speed or have other cons", CI is green, and the removed code doesn't mention any specific test cases the time has to be measured on prior to removal.

Anyway, @sarnex , WDYT?

@aelovikov-intel
Copy link
Contributor Author

If we say that this change improves readability and reduce maintenance cost, but do not impact compilation speed or have other cons, I don't have strong objections. Let the code owner @sarnex to make the final decision.

I think the former is true, and as for the "compilation speed or have other cons", CI is green, and the removed code doesn't mention any specific test cases the time has to be measured on prior to removal.

Anyway, @sarnex , WDYT?

E2E tests' time for the CI in this PR is very close the recent pulldown task at https://github.com/intel/llvm/actions/runs/11369991775. IMO, that's a good enough indication of "doesn't have a significant effect on compilation time".

@sarnex
Copy link
Contributor

sarnex commented Oct 16, 2024

I understand we don't have any tests or written requirements about the compilation time, but can you please just humor me and compile sycl/test-e2e/InvokeSimd/Spec/uniform_retval.cpp with and without this change and report the results?
You'll need these flags -fsycl -fno-sycl-device-code-split-esimd -Xclang -fsycl-allow-func-ptr. Thanks

@aelovikov-intel
Copy link
Contributor Author

I understand we don't have any tests or written requirements about the compilation time, but can you please just humor me and compile sycl/test-e2e/InvokeSimd/Spec/uniform_retval.cpp with and without this change and report the results? You'll need these flags -fsycl -fno-sycl-device-code-split-esimd -Xclang -fsycl-allow-func-ptr. Thanks

|    | origin/sycl | this PR merged with origin/sycl |
|real| 6.25s-6.29s | 6.28s-6.33s                     |
|user| 5.95s-6.0s  | 5.99s-6.08s                     |

@sarnex
Copy link
Contributor

sarnex commented Oct 16, 2024

thanks, seems within margin of error to me so im fine with this patch, thx again

@aelovikov-intel aelovikov-intel merged commit e94cfda into intel:sycl Oct 16, 2024
13 checks passed
@aelovikov-intel aelovikov-intel deleted the no-boost-in-esimd branch October 16, 2024 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants