Skip to content

[SYCL] Set marray alignment enabling vectorized loads #9395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 39 additions & 2 deletions sycl/include/sycl/marray.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,45 @@ template <typename T, typename... Ts> struct GetMArrayArgsSize<T, Ts...> {

} // namespace detail

template <std::size_t N, typename T> constexpr std::size_t vecAlignment() {
static_assert(N > 0, "Invalid number of elements.");
constexpr size_t SizeOfT = sizeof(T);
static_assert(SizeOfT > 0, "Invalid size of T.");
// First find the "previous" vector num elements.
size_t res = N >= 16 ? 16
: N >= 8 ? 8
: N >= 4 ? 4
: N >= 3 ? 3
: N >= 2 ? 2
: 1;
// Then calculate the alignment size in bytes, making sure it's power of 2.
res *= SizeOfT;
res--;
res |= res >> 1;
res |= res >> 2;
res |= res >> 4;
res |= res >> 8;
res |= res >> 16;
res++;
Comment on lines +57 to +63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not round up to the next power of 2 (neither for marray<T,3> nor if sizeof T isn't a power of 2) but round down instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting rounding down to the previous power of two purely for the sake of saving memory, or is there a bug in rounding up?

return res;
}

#if defined(_WIN32) || defined(_WIN64)
#define MARRAY_WINDOWS_ALIGN_ATTR \
__declspec(align(vecAlignment<NumElements, Type>()))
#define MARRAY_LINUX_ALIGN_ATTR
#else
#define MARRAY_WINDOWS_ALIGN_ATTR
#define MARRAY_LINUX_ALIGN_ATTR \
__attribute__((aligned(vecAlignment<NumElements, Type>())))
#endif

/// Provides a cross-platform math array class template that works on
/// SYCL devices as well as in host C++ code.
///
/// \ingroup sycl_api
template <typename Type, std::size_t NumElements> class marray {
template <typename Type, std::size_t NumElements>
class MARRAY_WINDOWS_ALIGN_ATTR marray {
using DataT = Type;

public:
Expand Down Expand Up @@ -361,7 +395,7 @@ template <typename Type, std::size_t NumElements> class marray {
}
return Ret;
}
};
} MARRAY_LINUX_ALIGN_ATTR;

#define __SYCL_MAKE_MARRAY_ALIAS(ALIAS, TYPE, N) \
using ALIAS##N = sycl::marray<TYPE, N>;
Expand Down Expand Up @@ -404,5 +438,8 @@ __SYCL_MAKE_MARRAY_ALIASES_FOR_MARRAY_LENGTH(16)
#undef __SYCL_MAKE_MARRAY_ALIASES_FOR_SIGNED_AND_UNSIGNED_TYPES
#undef __SYCL_MAKE_MARRAY_ALIASES_FOR_MARRAY_LENGTH

#undef MARRAY_LINUX_ALIGN_ATTR
#undef MARRAY_WINDOWS_ALIGN_ATTR

} // __SYCL_INLINE_VER_NAMESPACE(_V1)
} // namespace sycl