-
Notifications
You must be signed in to change notification settings - Fork 59
SimpleKernelTimer
Simple Kernel Timer provides statistics about Kernel execution time (parallel regions). It records number of calls to a kernel as well as its total execution time. If the developer provides a string label for the parallel region it is used as identifier. Otherwise, the C++ type name of the functor or lambda is used. Note that fencing is always turned on for the simple-kernel-timer’s callbacks.
The tool is located at: https://github.com/kokkos/kokkos-tools/tree/develop/profiling/simple-kernel-timer
Simply type "make" inside the source directory. When compiling for specific platforms modify the simple Makefile to use the correct compiler and compiler flags.
One can also use the cmake build system. The simple-kernel-timer is one of the tools that the Kokkos Tools CMake build system builds by default.
This is a standard tool which does not yet support tool chaining. In Bash do:
export KOKKOS_TOOLS_LIBS={PATH_TO_TOOL_DIRECTORY}/kp_kernel_timer.so
./application COMMANDS
This tool uses on the order of 200 bytes per unique kernel.
The SimpleKernelTimer tool will generate one file per process for the list of kernel. The files are named HOSTNAME-PROCESSID.dat
. The file is binary and required the kp_reader tool from the tool directory to be read. The kp_reader tool can read multiple files at the same time and will combine the results. This is for example useful to combine results of multiple MPI ranks.
Consider the following code:
#include<Kokkos_Core.hpp>
int main(int argc, char* argv[]) {
Kokkos::initialize(argc,argv);
{
int N = 100000000;
Kokkos::View<double*> a("A",N);
Kokkos::View<double*> b("B",N);
Kokkos::View<double*> c("C",N);
Kokkos::parallel_for(N, KOKKOS_LAMBDA (const int& i) {
a(i) = 1.0*i;
b(i) = 1.5*i;
c(i) = 0.0;
});
double result = 0.0;
for(int k = 0; k<50; k++) {
Kokkos::parallel_for("AXPB", N, KOKKOS_LAMBDA (const int& i) {
c(i) = 1.0*k*a(i) + b(i);
});
double dot;
Kokkos::parallel_reduce("Dot", N, KOKKOS_LAMBDA (const int& i, double& lsum) {
lsum += c(i)*c(i);
},dot);
result += dot;
}
printf("Result: %lf\n",result);
}
Kokkos::finalize();
}
Using kp_reader to read the output file of a run produces the following output. The columns are: Name, Total Time, Calls, Time/call, %of Kokkos Time, %of Total Time
AXPB 9.61008 50 0.19220 63.870 59.312
Dot 2.85686 50 0.05714 18.987 17.632
Z4mainE3$_0 2.57932 1 2.57932 17.143 15.919
-------------------------------------------------------------------------
Summary:
Total Execution Time (incl. Kokkos + Non-Kokkos: 16.20268 seconds
Total Time in Kokkos kernels: 15.04626 seconds
-> Time outside Kokkos kernels: 1.15642 seconds
-> Percentage in Kokkos kernels: 92.86 %
Total Calls to Kokkos Kernels: 101
-------------------------------------------------------------------------
SAND2017-3786