Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Rewrite properties storage #13776

Closed
wants to merge 19 commits into from

Conversation

rolandschulz
Copy link
Contributor

@rolandschulz rolandschulz commented May 13, 2024

Goal is to improve compilation time.
Store all properties as map. A map is type-list of type-lists. The first entry of the inner list is the property key and the 2nd the property value. Allows efficient lookup.
Runtime values are stored as base types. Allows efficient storage and retrieval.
std::tuple usage is removed (slow to instanstiate).

@rolandschulz rolandschulz requested review from a team as code owners May 13, 2024 23:24
@rolandschulz rolandschulz marked this pull request as draft May 13, 2024 23:24
Copy link
Contributor

@v-klochkov v-klochkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ESIMD part looks good.
BTW, if there are some changes/movements in the properties class I'll use the good moment to share the idea, that might be useful for users of properties class.

In ESIMD we needed the utility functions that add-property-if-not-in-list:
https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/ext/intel/esimd/memory_properties.hpp#L217-L221

and add-or-replace-property-in-list:
https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/ext/intel/esimd/memory_properties.hpp#L253-L257

In ESIMD it was needed only for alignment. Perhaps some more generic utility could be created and be useful in ext::oneapi::experimental::properties class (in property_utils.hpp)?

@rolandschulz
Copy link
Contributor Author

moved trivial changes into #13777 to reduce size of this

Copy link
Contributor

@cperkinsintel cperkinsintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. pinging @steffenlarsen for his take

@aelovikov-intel aelovikov-intel self-requested a review May 22, 2024 05:40
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very interesting! More comments would be good, especially one describing how we use the properties as base-classes for the properties type.

public:
template <class... T>
constexpr properties(T... v)
: T(v)... {} // T might have different ordering than V
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I pass something that isn't a property in the properties, will this result in a user-friendly message? At first glance I would think the compiler would scream about base classes, which is an implementation detail that the user should not worry about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. And I agree the message being about base class makes is less user-friendly. On the other hand the message names the property which is the problem which makes it more user-friendly than the static assert (for which we can't include the name in the message). I'm not sure whether the pro/cons is more important. Happy to add a static_assert if you think hiding implementation details is more important than giving the name of the property.

static_assert(has_property<PropertyT>(),
template <class P> static constexpr auto get_property(int = 0) {
using T = detail::mp11::mp_map_find<map, P>;
static_assert(!std::is_same_v<T, void>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually faster than just using has_property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

propably not. has_property calls mp_map_contains which calls mp_map_find and compares it against void. And the compiler should memorize the mp_map_find template instanstiation and therefore it shouldn't matter if it is called twice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I would prefer using has_property as we do so in the other get_property implementation.

return (!std::is_same<A, B>::value || ...);
}

template <typename V, typename = void> struct is_property_value {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still part of the extension, so I don't think it should be removed. Even if it is to be removed, it would be more fitting to do in a separate patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want me to also move the removal of tests for is_property_value and is_property_key into a follow up PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would make sense. It really depends on how ingrained it is. Seems like this relates to some of the other changes, as mentioned in my previous comment, so I will leave it up to you whether you want to split it, but it seems like the specification changes related to this is still in review.

@rolandschulz
Copy link
Contributor Author

This is very interesting! More comments would be good, especially one describing how we use the properties as base-classes for the properties type.

let me know what else would benefit from comments

@aelovikov-intel
Copy link
Contributor

  1. Do you have any data on the compile time improvements due to this?
  2. Any idea how the benefits are split between parsing (i.e. the feature isn't used by the customer, but still included in SYCL headers) vs when it's instantiated (customers uses some property)?
  3. Any idea what features/guarantees of std::tuple make it heavier than your approach?
  4. Should we be introducing features/builtins in FE to make std::tuple faster to compiler instead?

@rolandschulz
Copy link
Contributor Author

rolandschulz commented May 23, 2024

  1. Do you have any data on the compile time improvements due to this?

I haven't done any before/after comparison (yet). Before I started I looked at the compilation trace of the header and noticed the impact of std::tuple instanstiation and very slow compilation for long property lists (in particular >10 properties) because of the recursion used for e.g. merge, get, has. While I created multiple prototypes I made sure that those issues are resolved.
For doing before/after timing it matters a lot what one is interested in: e.g. short/long properties.

  1. Any idea how the benefits are split between parsing (i.e. the feature isn't used by the customer, but still included in SYCL headers) vs when it's instantiated (customers uses some property)?

I could imagine that it might benefit a bit given that the new implementation is shorter which should benefit parsing. But I haven't looked at any benchmarks for parsing. Note that the instantiation cost does affect users which don't use any properties. Properties are used internally by multiple of our extensions and they instantiate properties, e.g. as part of the default arguments. And it's not just empty property list because there are some default properties used.

  1. Any idea what features/guarantees of std::tuple make it heavier than your approach?

I didn't invent anything new. std::tuple being slow to compile is well documented. The most obvious problem was that std::tuple was being used in places a simple type list is sufficient (i.e. no runtime storage needed). detail::type_list doesn't require any work by the compiler to instantiate (it doesn't even have an implementation). For the runtime storage, the base class storage is more efficient because: 1) tuple requires extra overhead to support get by index (which isn't needed for us) 2) the std tuple implementations don't use base class storage but recursion for ABI/historic reasons. And templated recursion is always slow to instantiate because you need to instantiate each level. And additional having our own storage makes the constructor much simpler and faster to compile because we get the sorting of properties for free.

  1. Should we be introducing features/built-ins in FE to make std::tuple faster to compiler instead?

I don't think it is possible. You can't address the main issue of the recursive instanstiation with builtins. Suggestions for compilation speed improvements beyond this PR:

  • Other headers use tuple as type list. We should replace those with detail::type_list. This is a low hanging fruit we have to improve compilation speed.
  • It is easy to create a lightweight tuple with much faster compilation speed if one doesn't care about ABI combability. There are many of such on github (e.g. https://github.com/codeinred/tuplet). We could use something like that instead of the std::tuple internally in places we need the runtime storage of tuple but don't require it to be ABI compatibility with std::tuple. Note that even if we add such a lightweight tuple it would still be beneficial to integrate the storage directly into properties because of auto-sorting in the constructor.
  • The biggest remaining issue with compilation speed of properties is the sorting. I'm planning to upload a PR which uses insertion merge instead of qsort (used by mp_sort) for small lists where it is faster. But even insertion merge using template meta programming is quite slow. Here a built-in which does type-list sorting to replace the meta-programming could provide significant speedup.

@rolandschulz
Copy link
Contributor Author

Given bench:

#include <sycl/ext/oneapi/properties/properties.hpp>

using namespace sycl::ext::oneapi::experimental;
namespace mp11 = sycl::detail::boost::mp11;

template<class N> struct K 
#if OLD
: detail::compile_time_property_key_base_tag
#endif
{
    static constexpr detail::PropKind Kind = (detail::PropKind)N::value;
}; 
template<class N> using V = property_value<K<N>, N>;
using IDs = mp11::mp_iota_c<100>;
#if OLD
using P = properties<mp11::mp_apply<std::tuple, mp11::mp_transform<V, IDs>>>;
#else
using P = properties<mp11::mp_transform<V, IDs>>;
#endif

//template<class K> using is_key = std::bool_constant<P::has_property_key<K>()>;
template<class N> using correct_value = std::is_same<decltype(P::get_property<K<N>>()), V<N>>;

static_assert(mp11::mp_all_of<IDs, correct_value>());

I see ~2s for the new version and ~3.4s for the old version.

@aelovikov-intel
Copy link
Contributor

Hi @rolandschulz , thank you for the detailed reply to my questions. Can you please put most of it into the PR description? I think that reasoning is important to preserve/be found easy and I'd like to see it in-tree vs. hosting platform that can be changed in future (however unlikely that is).

Comment on lines +49 to +53
struct is_empty_or_incomplete : std::true_type {};
template <class T>
struct is_empty_or_incomplete<
T, std::enable_if_t<(sizeof(T) > 0) && !std::is_empty_v<T>>>
: std::false_type {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we do the same thing before? I'm worried about "incomplete" part and issues it may cause if the behavior is different when we ask for the same thing both before and after something become "complete" (and that would also be ODR-violation, I guess).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. And it's unrelated to the storage change. It is related to #13669 which suggests that we allow to define properties as:

inline constexpr property_value<struct bar_key> bar;

If we don't allow incomplete we need to require it to be:

struct bar_key {};
inline constexpr property_value<bar_key> bar;

It has no other advantage besides being less verbose. So if we think this might be a problem we don't need to do this.
Note that it wouldn't be a concern that it would become complete. Because such a key would always just be forward declared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with with your reaction? You prefer to change it or keep it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it, liked the simplification of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that these changes require #13669 to be approved and merged? Note that merging this without #13669 could result in releases where the implementation deviates from the extension specification.

@aelovikov-intel
Copy link
Contributor

aelovikov-intel commented May 23, 2024

I see ~2s for the new version and ~3.4s for the old version.

I have similar numbers for your case (3.15s->1.7s), but

# Old
$ time clang++ -fsycl -include 'sycl/sycl.hpp' -x c++ /dev/null -c -o /dev/null

real    0m3.575s
user    0m3.431s
sys 0m0.143s
# New
$ time clang++ -fsycl -include 'sycl/sycl.hpp' -x c++ /dev/null -c -o /dev/null

real    0m3.634s
user    0m3.460s
sys 0m0.173s

To be clear, this is not a request/objection, just an observation.

: std::bool_constant<
ext::oneapi::experimental::is_property_key_of<PropT, SyclT>::value &&
all_props_are_keys_of<SyclT, PropTs...>()> {};
#if __cplusplus > 201402L
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We require C++17 for the compiler already.

@aelovikov-intel
Copy link
Contributor

Looks good to me but @steffenlarsen is more fluent with this than I am, so I'm deferring the approval to him.

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like the new strategy, but there seems to be parts that require extension spec changes.

PropKey, std::tuple<Props...>>::type,
DefaultPropVal>;
struct GetPropertyValueFromPropList {
using V =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; Another name that could hit a macro.

Comment on lines +49 to +53
struct is_empty_or_incomplete : std::true_type {};
template <class T>
struct is_empty_or_incomplete<
T, std::enable_if_t<(sizeof(T) > 0) && !std::is_empty_v<T>>>
: std::false_type {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that these changes require #13669 to be approved and merged? Note that merging this without #13669 could result in releases where the implementation deviates from the extension specification.

static_assert(has_property<PropertyT>(),
template <class P> static constexpr auto get_property(int = 0) {
using T = detail::mp11::mp_map_find<map, P>;
static_assert(!std::is_same_v<T, void>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I would prefer using has_property as we do so in the other get_property implementation.

Comment on lines +144 to +145
using A = property_map<PropA>;
using B = property_map<PropB>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another case of potential conflict single-letter names. Maybe PropAMap or PropMapA, etc?

return (!std::is_same<A, B>::value || ...);
}

template <typename V, typename = void> struct is_property_value {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would make sense. It really depends on how ingrained it is. Seems like this relates to some of the other changes, as mentioned in my previous comment, so I will leave it up to you whether you want to split it, but it seems like the specification changes related to this is still in review.

aelovikov-intel added a commit that referenced this pull request Nov 13, 2024
This is based on @rolandschulz 's
#13776.

The only remaining boost/mp11 usage is to sort/filter properties, I'm
going to remove that in a separate PRs.

I've also left multiple utilities (that accept variadic pack) still
using `std::tuple` as *their* implementation detail. That can be cleaned
up separately as well.

Closes #13677.
Copy link
Contributor

github-actions bot commented Dec 1, 2024

This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.

@github-actions github-actions bot added the Stale label Dec 1, 2024
Copy link
Contributor

github-actions bot commented Jan 1, 2025

This pull request was closed because it has been stalled for 30 days with no activity.

@github-actions github-actions bot closed this Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants