-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error injection and fixes to resilient OpenMP execution spaces #71
base: main
Are you sure you want to change the base?
Conversation
1914fa8
to
0174855
Compare
Can you rebase this on main to get the CI fix? |
2579e4a
to
47cff8d
Compare
…el_for template types correctly in MiniMD
…subscriber naming convention
…ry rebase issues.
47cff8d
to
75992b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial part of the review, will continue with more
@@ -1,6 +1,9 @@ | |||
cmake_minimum_required(VERSION 3.17) | |||
project(kokkos-resilience VERSION 0.1.0) | |||
|
|||
#OLD bheavior is deprecated by definition and may be removed in future | |||
cmake_policy(SET CMP0144 OLD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't add this
@@ -10,7 +13,7 @@ add_library(Kokkos::resilience ALIAS resilience) | |||
|
|||
|
|||
option(KR_ALL_WARNINGS "Enable all warnings" ON) | |||
option(KR_WARNINGS_AS_ERRORS "Enable warnings as errors" ON) | |||
option(KR_WARNINGS_AS_ERRORS "Enable warnings as errors" OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep this as on -- if you need to set this to off do so in your configure
@@ -138,6 +142,22 @@ export(TARGETS resilience | |||
FILE resilienceTargets.cmake | |||
) | |||
|
|||
if (Kokkos_ENABLE_Cuda) | |||
target_compile_definitions(resilience PUBLIC KR_ENABLE_CUDA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How necessary is this?
@@ -48,6 +48,8 @@ | |||
#include <iomanip> | |||
#include <iostream> | |||
|
|||
#include <iostream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be removed
} | ||
} | ||
else{ | ||
std::cout << "Error finding error_rate. Were global error settings enabled?\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need this output
KokkosResilience::clear_duplicates_map(); | ||
#endif | ||
repeats--; | ||
|
||
}// while (!success & repeats left) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should remove dead code here
|
||
repeats--; | ||
|
||
}// while (!success & repeats left) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove dead code here
|
||
// Range policy implementation | ||
template <class CombinedFunctorReducerType, class... Traits> | ||
class ParallelReduce< CombinedFunctorReducerType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this guy working properly? If not lets omit from the PR
inline static std::chrono::duration<long int, std::ratio<1, 1000000000>> elapsed_seconds{}; | ||
inline static std::chrono::duration<long int, std::ratio<1, 1000000000>> total_error_time{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline static std::chrono::duration<long int, std::ratio<1, 1000000000>> elapsed_seconds{}; | |
inline static std::chrono::duration<long int, std::ratio<1, 1000000000>> total_error_time{}; | |
inline static std::chrono::duration<long int, std::nano> elapsed_seconds{}; | |
inline static std::chrono::duration<long int, std::nano> total_error_time{}; |
inline static size_t global_next_inject = 0; | ||
}; | ||
|
||
struct ETimer{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct ETimer{ | |
struct ErrorTimerSettings{ |
An omnibus PR to bring Kokkos Resilience up to date with execution space work.
Apologies for the mega PR, smaller PRs in the future!