@@ -12,8 +12,8 @@ RAJA Performance Suite
12
12
13
13
[ ![ Build Status] ( https://travis-ci.org/LLNL/RAJAPerf.svg?branch=develop )] ( https://travis-ci.org/LLNL/RAJAPerf )
14
14
15
- The RAJA performance suite is designed to explore performance of loop-based
16
- computational kernels found in HPC applications. In particular , it
15
+ The RAJA performance suite is developed to explore performance of loop-based
16
+ computational kernels found in HPC applications. Specifically , it
17
17
is used to assess, monitor, and compare runtime performance of kernels
18
18
implemented using RAJA and variants implemented using standard or
19
19
vendor-supported parallel programming models directly. Each kernel in the
@@ -66,15 +66,18 @@ submodules. For example,
66
66
> cd RAJAPerf
67
67
> git checkout <some branch name>
68
68
> git submodule init
69
- > git submodule update
69
+ > git submodule update --recursive
70
70
```
71
71
72
+ Note that the ` --recursive ` will update submodules within submodules, similar
73
+ to usage with the ` git clone ` as described above.
74
+
72
75
RAJA and the Performance Suite are built together using the same CMake
73
76
configuration. For convenience, we include scripts in the ` scripts `
74
77
directory that invoke corresponding configuration files (CMake cache files)
75
78
in the RAJA submodule. For example, the ` scripts/lc-builds ` directory
76
79
contains scripts that show how we build code for testing on platforms in
77
- the Livermore Computing Center. Each build script creates a
80
+ the Lawrence Livermore Computing Center. Each build script creates a
78
81
descriptively-named build space directory in the top-level Performance Suite
79
82
directory and runs CMake with a configuration appropriate for the platform and
80
83
compilers used. After CMake completes, enter the build directory and type
@@ -255,14 +258,22 @@ consistent.
255
258
Each kernel in the suite is implemented in a class whose header and
256
259
implementation files live in the directory named for the group
257
260
in which the kernel lives. The kernel class is responsible for implementing
258
- all operations needed to execute and record execution timing and result
259
- checksum information for each variant of the kernel.
261
+ all operations needed to manage data, execute and record execution timing and
262
+ result checksum information for each variant of the kernel.
263
+ To properly plug in to the Perf Suite framework, the kernel class must
264
+ inherit from the ` KernelBase ` base class that defines the interface for
265
+ a kernel in the suite.
266
+
267
+ Continuing with our example, we add a 'Foo' class header file 'Foo.hpp',
268
+ and multiple implementation files described in the following sections:
269
+ * 'Foo.cpp' contains the methods to setup and teardown the memory for the
270
+ 'Foo kernel, and compute and record a checksum on the result after it
271
+ executes;
272
+ * 'Foo-Seq.cpp' contains CPU variants of the kernel;
273
+ * 'Foo-OMP.cpp' contains OpenMP CPU multithreading variants of the kernel;
274
+ * 'Foo-Cuda.cpp' contains CUDA GPU variants of the kernel; and
275
+ * 'Foo-OMPTarget.cpp' contains OpenMP target offload variants of the kernel.
260
276
261
- Continuing with our example, we add 'Foo' class header and implementation
262
- files 'Foo.hpp', 'Foo.cpp' (CPU variants), ` Foo-Cuda.cpp ` (CUDA variants),
263
- and ` Foo-OMPTarget.cpp ` (OpenMP target variants) to the 'src/bar' directory.
264
- The class must inherit from the 'KernelBase' base class that defines the
265
- interface for kernels in the suite.
266
277
267
278
#### Kernel class header
268
279
@@ -298,10 +309,11 @@ public:
298
309
~ Foo();
299
310
300
311
void setUp(VariantID vid);
301
- void runKernel(VariantID vid);
302
312
void updateChecksum(VariantID vid);
303
313
void tearDown(VariantID vid);
304
314
315
+ void runSeqVariant(VariantID vid);
316
+ void runOpenMPVariant(VariantID vid);
305
317
void runCudaVariant(VariantID vid);
306
318
void runOpenMPTargetVariant(VariantID vid);
307
319
@@ -319,21 +331,21 @@ The kernel object header has a uniquely-named header file include guard and
319
331
the class is nested within the 'rajaperf' and 'bar' namespaces. The
320
332
constructor takes a reference to a 'RunParams' object, which contains the
321
333
input parameters for running the suite -- we'll say more about this later.
322
- The four methods that take a variant ID argument must be provided as they are
334
+ The seven methods that take a variant ID argument must be provided as they are
323
335
pure virtual in the KernelBase class. Their names are descriptive of what they
324
336
do and we'll provide more details when we describe the class implementation
325
337
next.
326
338
327
339
#### Kernel class implementation
328
340
329
341
All kernels in the suite follow a similar implementation pattern for
330
- consistency and ease of understanding. Here we describe several steps and
331
- conventions that must be followed to ensure that all kernels interact with
332
- the performance suite machinery in the same way:
342
+ consistency and ease of analysis and understanding. Here, we describe several
343
+ steps and conventions that must be followed to ensure that all kernels
344
+ interact with the performance suite machinery in the same way:
333
345
334
346
1 . Initialize the 'KernelBase' class object with KernelID, default size, and default repetition count in the ` class constructor ` .
335
- 2 . Implement data allocation and initialization operation for each kernel variant in the ` setUp ` method.
336
- 3 . Implement kernel execution for each variant in the ` RunKernel ` method .
347
+ 2 . Implement data allocation and initialization operations for each kernel variant in the ` setUp ` method.
348
+ 3 . Implement kernel execution for the associated variants in the ` run ` methods .
337
349
4 . Compute the checksum for each variant in the ` updateChecksum ` method.
338
350
5 . Deallocate and reset any data that will be allocated and/or initialized in subsequent kernel executions in the ` tearDown ` method.
339
351
@@ -388,15 +400,17 @@ utility methods to allocate, initialize, deallocate, and copy data, and compute
388
400
checksums defined in the `DataUtils.hpp` `CudaDataUtils.hpp`, and
389
401
`OpenMPTargetDataUtils.hpp` header files in the 'common' directory.
390
402
391
- ##### runKernel() method
403
+ ##### run methods
392
404
393
- The 'runKernel()' method executes the kernel for the variant defined by the
394
- variant ID argument. The method is also responsible for calling base class
395
- methods to start and stop execution timers for the loop variant. A typical
396
- kernel execution code section may look like:
405
+ Which files contain which 'run' methods and associated variant implementations
406
+ is described above. Each method take a variant ID argument which identifies
407
+ the variant to be run for each programming model back-end. Each method is also
408
+ responsible for calling base class methods to start and stop execution timers
409
+ when a loop variant is run. A typical kernel execution code section may look
410
+ like:
397
411
398
412
```cpp
399
- void Foo::runKernel (VariantID vid)
413
+ void Foo::runSeqVariant (VariantID vid)
400
414
{
401
415
const Index_type run_reps = getRunReps();
402
416
// ...
@@ -408,8 +422,8 @@ void Foo::runKernel(VariantID vid)
408
422
// Declare data for baseline sequential variant of kernel...
409
423
410
424
startTimer();
411
- for (SampIndex_type irep = 0; irep < run_reps; ++irep) {
412
- // Implementation of kernel variant...
425
+ for (RepIndex_type irep = 0; irep < run_reps; ++irep) {
426
+ // Implementation of Base_Seq kernel variant...
413
427
}
414
428
stopTimer();
415
429
@@ -418,25 +432,29 @@ void Foo::runKernel(VariantID vid)
418
432
break;
419
433
}
420
434
421
- // case statements for other CPU kernel variants....
435
+ #if defined(RUN_RAJA_SEQ)
436
+ case Lambda_Seq : {
437
+
438
+ startTimer();
439
+ for (RepIndex_type irep = 0; irep < run_reps; ++irep) {
440
+ // Implementation of Lambda_Seq kernel variant...
441
+ }
442
+ stopTimer();
422
443
423
- #if defined(RAJA_ENABLE_TARGET_OPENMP)
424
- case Base_OpenMPTarget :
425
- case RAJA_OpenMPTarget :
426
- {
427
- runOpenMPTargetVariant(vid);
428
444
break;
429
445
}
430
- #endif
431
446
432
- #if defined(RAJA_ENABLE_CUDA)
433
- case Base_CUDA :
434
- case RAJA_CUDA :
435
- {
436
- runCudaVariant(vid);
447
+ case RAJA_Seq : {
448
+
449
+ startTimer();
450
+ for (RepIndex_type irep = 0; irep < run_reps; ++irep) {
451
+ // Implementation of RAJA_Seq kernel variant...
452
+ }
453
+ stopTimer();
454
+
437
455
break;
438
456
}
439
- #endif
457
+ #endif // RUN_RAJA_SEQ
440
458
441
459
default : {
442
460
std::cout << "\n <kernel-name> : Unknown variant id = " << vid << std::endl;
@@ -449,18 +467,17 @@ void Foo::runKernel(VariantID vid)
449
467
All kernel implementation files are organized in this way. So following this
450
468
pattern will keep all new additions consistent.
451
469
452
- Note: There are three source files for each kernel: 'Foo.cpp' contains CPU
453
- variants, ` Foo-Cuda.cpp ` contains CUDA variants, and ` Foo-OMPTarget.cpp `
454
- constains OpenMP target variants. The reason for this is that it makes it
455
- easier to apply unique compiler flags to different variants and to manage
456
- compilation and linking issues that arise when some kernel variants are
457
- combined in the same translation unit.
470
+ Note: As described earlier, there are five source files for each kernel.
471
+ The reason for this is that it makes it easier to apply unique compiler flags
472
+ to different variants and to manage compilation and linking issues that arise
473
+ when some kernel variants are combined in the same translation unit.
458
474
459
475
Note: for convenience, we make heavy use of macros to define data
460
476
declarations and kernel bodies in the suite. This significantly reduces
461
477
the amount of redundant code required to implement multiple variants
462
- of each kernel. The kernel class implementation files in the suite
463
- provide many examples of the basic pattern we use.
478
+ of each kernel and make sure things are the same as much as possible.
479
+ The kernel class implementation files in the suite provide many examples of
480
+ the basic pattern we use.
464
481
465
482
##### updateChecksum() method
466
483
0 commit comments