-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Template recursion-free static_ford #2011
base: develop
Are you sure you want to change the base?
Conversation
…ePlatform/composable_kernel into special-static-for-d
@@ -4,6 +4,7 @@ | |||
#pragma once | |||
|
|||
#include "ck/utility/common_header.hpp" | |||
#include "ck/utility/functional3.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static_ford is used but not included explicitly
@@ -547,6 +547,10 @@ if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU" AND CMAKE_CXX_COMPILER_VERSION VERS | |||
add_compile_options(-fdiagnostics-color=always) | |||
endif() | |||
|
|||
# fold expression depth | |||
# device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_f16_comp_instance.cpp | |||
add_compile_options(-fbracket-depth=3136) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternative is to implement some sort of paging for the parameter packs, not sure how to do it yet
…ePlatform/composable_kernel into special-static-for-d
…ePlatform/composable_kernel into special-static-for-d
…ePlatform/composable_kernel into special-static-for-d
__host__ __device__ constexpr void operator()(F f) const | ||
{ | ||
constexpr auto ordered_lengths = Lengths::ReorderGivenNew2Old(Orders{}); | ||
detail::static_ford_impl<decltype(ordered_lengths), Orders>{}(f, Sequence<>{}); | ||
base::template operator()<F, IndexTransform>(f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible improvement here is to figure out how to bake index transform into the base class, to do using base::operator()
and avoid one extra template instantiation
include/ck/utility/functional3.hpp
Outdated
using TCumProd = typename make_cumulative_product<Dims...>::type; | ||
|
||
template <ck::index_t flat_idx> | ||
using type = decltype((TCumProd{} * ck::Number<flat_idx>{} / ck::Number<Prod>{}) % SDim{}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my initial expectation was this would be fairly cheap but it is not
include/ck/utility/functional3.hpp
Outdated
|
||
// clang-format off | ||
template <int32_t... Idims> | ||
struct make_cumulative_product |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recursive template definition seems doable but complex
// clang-format off | ||
|
||
template <int32_t IDim0> | ||
constexpr auto make_cumulative_product(ck::Number<IDim0>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functions are faster than structs
Proposed changes
The current implementation of static_ford uses recursion over the sequence of dimensions and instantiates lambdas at each internal step. This can be avoided and should improve compilation time. The current most time consuming template is static_for(0,1) from static_ford instantiation internals for both old CK and CK-tile
Checklist
Please put an
x
into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-format
on all changed filesDiscussion
Snapshot -