Skip to content

Commit 21e476f

Browse files
authored
Merge pull request #79 from LLNL/develop
Merge develop into master
2 parents c81b50d + e409059 commit 21e476f

File tree

292 files changed

+14761
-7828
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

292 files changed

+14761
-7828
lines changed

README.md

+64-47
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@ RAJA Performance Suite
1212

1313
[![Build Status](https://travis-ci.org/LLNL/RAJAPerf.svg?branch=develop)](https://travis-ci.org/LLNL/RAJAPerf)
1414

15-
The RAJA performance suite is designed to explore performance of loop-based
16-
computational kernels found in HPC applications. In particular, it
15+
The RAJA performance suite is developed to explore performance of loop-based
16+
computational kernels found in HPC applications. Specifically, it
1717
is used to assess, monitor, and compare runtime performance of kernels
1818
implemented using RAJA and variants implemented using standard or
1919
vendor-supported parallel programming models directly. Each kernel in the
@@ -66,15 +66,18 @@ submodules. For example,
6666
> cd RAJAPerf
6767
> git checkout <some branch name>
6868
> git submodule init
69-
> git submodule update
69+
> git submodule update --recursive
7070
```
7171

72+
Note that the `--recursive` will update submodules within submodules, similar
73+
to usage with the `git clone` as described above.
74+
7275
RAJA and the Performance Suite are built together using the same CMake
7376
configuration. For convenience, we include scripts in the `scripts`
7477
directory that invoke corresponding configuration files (CMake cache files)
7578
in the RAJA submodule. For example, the `scripts/lc-builds` directory
7679
contains scripts that show how we build code for testing on platforms in
77-
the Livermore Computing Center. Each build script creates a
80+
the Lawrence Livermore Computing Center. Each build script creates a
7881
descriptively-named build space directory in the top-level Performance Suite
7982
directory and runs CMake with a configuration appropriate for the platform and
8083
compilers used. After CMake completes, enter the build directory and type
@@ -255,14 +258,22 @@ consistent.
255258
Each kernel in the suite is implemented in a class whose header and
256259
implementation files live in the directory named for the group
257260
in which the kernel lives. The kernel class is responsible for implementing
258-
all operations needed to execute and record execution timing and result
259-
checksum information for each variant of the kernel.
261+
all operations needed to manage data, execute and record execution timing and
262+
result checksum information for each variant of the kernel.
263+
To properly plug in to the Perf Suite framework, the kernel class must
264+
inherit from the `KernelBase` base class that defines the interface for
265+
a kernel in the suite.
266+
267+
Continuing with our example, we add a 'Foo' class header file 'Foo.hpp',
268+
and multiple implementation files described in the following sections:
269+
* 'Foo.cpp' contains the methods to setup and teardown the memory for the
270+
'Foo kernel, and compute and record a checksum on the result after it
271+
executes;
272+
* 'Foo-Seq.cpp' contains CPU variants of the kernel;
273+
* 'Foo-OMP.cpp' contains OpenMP CPU multithreading variants of the kernel;
274+
* 'Foo-Cuda.cpp' contains CUDA GPU variants of the kernel; and
275+
* 'Foo-OMPTarget.cpp' contains OpenMP target offload variants of the kernel.
260276

261-
Continuing with our example, we add 'Foo' class header and implementation
262-
files 'Foo.hpp', 'Foo.cpp' (CPU variants), `Foo-Cuda.cpp` (CUDA variants),
263-
and `Foo-OMPTarget.cpp` (OpenMP target variants) to the 'src/bar' directory.
264-
The class must inherit from the 'KernelBase' base class that defines the
265-
interface for kernels in the suite.
266277

267278
#### Kernel class header
268279

@@ -298,10 +309,11 @@ public:
298309
~Foo();
299310

300311
void setUp(VariantID vid);
301-
void runKernel(VariantID vid);
302312
void updateChecksum(VariantID vid);
303313
void tearDown(VariantID vid);
304314

315+
void runSeqVariant(VariantID vid);
316+
void runOpenMPVariant(VariantID vid);
305317
void runCudaVariant(VariantID vid);
306318
void runOpenMPTargetVariant(VariantID vid);
307319

@@ -319,21 +331,21 @@ The kernel object header has a uniquely-named header file include guard and
319331
the class is nested within the 'rajaperf' and 'bar' namespaces. The
320332
constructor takes a reference to a 'RunParams' object, which contains the
321333
input parameters for running the suite -- we'll say more about this later.
322-
The four methods that take a variant ID argument must be provided as they are
334+
The seven methods that take a variant ID argument must be provided as they are
323335
pure virtual in the KernelBase class. Their names are descriptive of what they
324336
do and we'll provide more details when we describe the class implementation
325337
next.
326338

327339
#### Kernel class implementation
328340

329341
All kernels in the suite follow a similar implementation pattern for
330-
consistency and ease of understanding. Here we describe several steps and
331-
conventions that must be followed to ensure that all kernels interact with
332-
the performance suite machinery in the same way:
342+
consistency and ease of analysis and understanding. Here, we describe several
343+
steps and conventions that must be followed to ensure that all kernels
344+
interact with the performance suite machinery in the same way:
333345

334346
1. Initialize the 'KernelBase' class object with KernelID, default size, and default repetition count in the `class constructor`.
335-
2. Implement data allocation and initialization operation for each kernel variant in the `setUp` method.
336-
3. Implement kernel execution for each variant in the `RunKernel` method.
347+
2. Implement data allocation and initialization operations for each kernel variant in the `setUp` method.
348+
3. Implement kernel execution for the associated variants in the `run` methods.
337349
4. Compute the checksum for each variant in the `updateChecksum` method.
338350
5. Deallocate and reset any data that will be allocated and/or initialized in subsequent kernel executions in the `tearDown` method.
339351

@@ -388,15 +400,17 @@ utility methods to allocate, initialize, deallocate, and copy data, and compute
388400
checksums defined in the `DataUtils.hpp` `CudaDataUtils.hpp`, and
389401
`OpenMPTargetDataUtils.hpp` header files in the 'common' directory.
390402
391-
##### runKernel() method
403+
##### run methods
392404
393-
The 'runKernel()' method executes the kernel for the variant defined by the
394-
variant ID argument. The method is also responsible for calling base class
395-
methods to start and stop execution timers for the loop variant. A typical
396-
kernel execution code section may look like:
405+
Which files contain which 'run' methods and associated variant implementations
406+
is described above. Each method take a variant ID argument which identifies
407+
the variant to be run for each programming model back-end. Each method is also
408+
responsible for calling base class methods to start and stop execution timers
409+
when a loop variant is run. A typical kernel execution code section may look
410+
like:
397411
398412
```cpp
399-
void Foo::runKernel(VariantID vid)
413+
void Foo::runSeqVariant(VariantID vid)
400414
{
401415
const Index_type run_reps = getRunReps();
402416
// ...
@@ -408,8 +422,8 @@ void Foo::runKernel(VariantID vid)
408422
// Declare data for baseline sequential variant of kernel...
409423
410424
startTimer();
411-
for (SampIndex_type irep = 0; irep < run_reps; ++irep) {
412-
// Implementation of kernel variant...
425+
for (RepIndex_type irep = 0; irep < run_reps; ++irep) {
426+
// Implementation of Base_Seq kernel variant...
413427
}
414428
stopTimer();
415429
@@ -418,25 +432,29 @@ void Foo::runKernel(VariantID vid)
418432
break;
419433
}
420434
421-
// case statements for other CPU kernel variants....
435+
#if defined(RUN_RAJA_SEQ)
436+
case Lambda_Seq : {
437+
438+
startTimer();
439+
for (RepIndex_type irep = 0; irep < run_reps; ++irep) {
440+
// Implementation of Lambda_Seq kernel variant...
441+
}
442+
stopTimer();
422443
423-
#if defined(RAJA_ENABLE_TARGET_OPENMP)
424-
case Base_OpenMPTarget :
425-
case RAJA_OpenMPTarget :
426-
{
427-
runOpenMPTargetVariant(vid);
428444
break;
429445
}
430-
#endif
431446
432-
#if defined(RAJA_ENABLE_CUDA)
433-
case Base_CUDA :
434-
case RAJA_CUDA :
435-
{
436-
runCudaVariant(vid);
447+
case RAJA_Seq : {
448+
449+
startTimer();
450+
for (RepIndex_type irep = 0; irep < run_reps; ++irep) {
451+
// Implementation of RAJA_Seq kernel variant...
452+
}
453+
stopTimer();
454+
437455
break;
438456
}
439-
#endif
457+
#endif // RUN_RAJA_SEQ
440458
441459
default : {
442460
std::cout << "\n <kernel-name> : Unknown variant id = " << vid << std::endl;
@@ -449,18 +467,17 @@ void Foo::runKernel(VariantID vid)
449467
All kernel implementation files are organized in this way. So following this
450468
pattern will keep all new additions consistent.
451469

452-
Note: There are three source files for each kernel: 'Foo.cpp' contains CPU
453-
variants, `Foo-Cuda.cpp` contains CUDA variants, and `Foo-OMPTarget.cpp`
454-
constains OpenMP target variants. The reason for this is that it makes it
455-
easier to apply unique compiler flags to different variants and to manage
456-
compilation and linking issues that arise when some kernel variants are
457-
combined in the same translation unit.
470+
Note: As described earlier, there are five source files for each kernel.
471+
The reason for this is that it makes it easier to apply unique compiler flags
472+
to different variants and to manage compilation and linking issues that arise
473+
when some kernel variants are combined in the same translation unit.
458474

459475
Note: for convenience, we make heavy use of macros to define data
460476
declarations and kernel bodies in the suite. This significantly reduces
461477
the amount of redundant code required to implement multiple variants
462-
of each kernel. The kernel class implementation files in the suite
463-
provide many examples of the basic pattern we use.
478+
of each kernel and make sure things are the same as much as possible.
479+
The kernel class implementation files in the suite provide many examples of
480+
the basic pattern we use.
464481

465482
##### updateChecksum() method
466483

src/CMakeLists.txt

+57-1
Original file line numberDiff line numberDiff line change
@@ -32,87 +32,143 @@ blt_add_executable(
3232
SOURCES RAJAPerfSuiteDriver.cpp
3333
apps/AppsData.cpp
3434
apps/DEL_DOT_VEC_2D.cpp
35+
apps/DEL_DOT_VEC_2D-Seq.cpp
3536
apps/DEL_DOT_VEC_2D-OMPTarget.cpp
3637
apps/ENERGY.cpp
38+
apps/ENERGY-Seq.cpp
3739
apps/ENERGY-OMPTarget.cpp
3840
apps/FIR.cpp
41+
apps/FIR-Seq.cpp
3942
apps/FIR-OMPTarget.cpp
4043
apps/PRESSURE.cpp
44+
apps/PRESSURE-Seq.cpp
4145
apps/PRESSURE-OMPTarget.cpp
4246
apps/LTIMES.cpp
47+
apps/LTIMES-Seq.cpp
4348
apps/LTIMES-OMPTarget.cpp
4449
apps/LTIMES_NOVIEW.cpp
50+
apps/LTIMES_NOVIEW-Seq.cpp
4551
apps/LTIMES_NOVIEW-OMPTarget.cpp
46-
apps/WIP-COUPLE.cpp
4752
apps/VOL3D.cpp
53+
apps/VOL3D-Seq.cpp
4854
apps/VOL3D-OMPTarget.cpp
55+
apps/WIP-COUPLE.cpp
56+
basic/ATOMIC_PI.cpp
57+
basic/ATOMIC_PI-Seq.cpp
58+
basic/ATOMIC_PI-OMPTarget.cpp
4959
basic/DAXPY.cpp
60+
basic/DAXPY-Seq.cpp
5061
basic/DAXPY-OMPTarget.cpp
5162
basic/IF_QUAD.cpp
63+
basic/IF_QUAD-Seq.cpp
5264
basic/IF_QUAD-OMPTarget.cpp
5365
basic/INIT3.cpp
66+
basic/INIT3-Seq.cpp
5467
basic/INIT3-OMPTarget.cpp
5568
basic/INIT_VIEW1D.cpp
69+
basic/INIT_VIEW1D-Seq.cpp
5670
basic/INIT_VIEW1D-OMPTarget.cpp
5771
basic/INIT_VIEW1D_OFFSET.cpp
72+
basic/INIT_VIEW1D_OFFSET-Seq.cpp
5873
basic/INIT_VIEW1D_OFFSET-OMPTarget.cpp
5974
basic/MULADDSUB.cpp
75+
basic/MULADDSUB-Seq.cpp
6076
basic/MULADDSUB-OMPTarget.cpp
6177
basic/NESTED_INIT.cpp
78+
basic/NESTED_INIT-Seq.cpp
6279
basic/NESTED_INIT-OMPTarget.cpp
6380
basic/REDUCE3_INT.cpp
81+
basic/REDUCE3_INT-Seq.cpp
6482
basic/REDUCE3_INT-OMPTarget.cpp
6583
basic/TRAP_INT.cpp
84+
basic/TRAP_INT-Seq.cpp
6685
basic/TRAP_INT-OMPTarget.cpp
6786
lcals/DIFF_PREDICT.cpp
87+
lcals/DIFF_PREDICT-Seq.cpp
6888
lcals/DIFF_PREDICT-OMPTarget.cpp
6989
lcals/EOS.cpp
90+
lcals/EOS-Seq.cpp
7091
lcals/EOS-OMPTarget.cpp
7192
lcals/FIRST_DIFF.cpp
93+
lcals/FIRST_DIFF-Seq.cpp
7294
lcals/FIRST_DIFF-OMPTarget.cpp
95+
lcals/FIRST_MIN.cpp
96+
lcals/FIRST_MIN-Seq.cpp
97+
lcals/FIRST_MIN-OMPTarget.cpp
98+
lcals/FIRST_SUM.cpp
99+
lcals/FIRST_SUM-Seq.cpp
100+
lcals/FIRST_SUM-OMPTarget.cpp
101+
lcals/GEN_LIN_RECUR.cpp
102+
lcals/GEN_LIN_RECUR-Seq.cpp
103+
lcals/GEN_LIN_RECUR-OMPTarget.cpp
73104
lcals/HYDRO_1D.cpp
105+
lcals/HYDRO_1D-Seq.cpp
74106
lcals/HYDRO_1D-OMPTarget.cpp
75107
lcals/HYDRO_2D.cpp
108+
lcals/HYDRO_2D-Seq.cpp
76109
lcals/HYDRO_2D-OMPTarget.cpp
77110
lcals/INT_PREDICT.cpp
111+
lcals/INT_PREDICT-Seq.cpp
78112
lcals/INT_PREDICT-OMPTarget.cpp
79113
lcals/PLANCKIAN.cpp
114+
lcals/PLANCKIAN-Seq.cpp
80115
lcals/PLANCKIAN-OMPTarget.cpp
116+
lcals/TRIDIAG_ELIM.cpp
117+
lcals/TRIDIAG_ELIM-Seq.cpp
118+
lcals/TRIDIAG_ELIM-OMPTarget.cpp
81119
polybench/POLYBENCH_2MM.cpp
120+
polybench/POLYBENCH_2MM-Seq.cpp
82121
polybench/POLYBENCH_2MM-OMPTarget.cpp
83122
polybench/POLYBENCH_3MM.cpp
123+
polybench/POLYBENCH_3MM-Seq.cpp
84124
polybench/POLYBENCH_3MM-OMPTarget.cpp
85125
polybench/POLYBENCH_ADI.cpp
126+
polybench/POLYBENCH_ADI-Seq.cpp
86127
polybench/POLYBENCH_ADI-OMPTarget.cpp
87128
polybench/POLYBENCH_ATAX.cpp
129+
polybench/POLYBENCH_ATAX-Seq.cpp
88130
polybench/POLYBENCH_ATAX-OMPTarget.cpp
89131
polybench/POLYBENCH_FDTD_2D.cpp
132+
polybench/POLYBENCH_FDTD_2D-Seq.cpp
90133
polybench/POLYBENCH_FDTD_2D-OMPTarget.cpp
91134
polybench/POLYBENCH_FLOYD_WARSHALL.cpp
135+
polybench/POLYBENCH_FLOYD_WARSHALL-Seq.cpp
92136
polybench/POLYBENCH_FLOYD_WARSHALL-OMPTarget.cpp
93137
polybench/POLYBENCH_GEMM.cpp
138+
polybench/POLYBENCH_GEMM-Seq.cpp
94139
polybench/POLYBENCH_GEMM-OMPTarget.cpp
95140
polybench/POLYBENCH_GEMVER.cpp
141+
polybench/POLYBENCH_GEMVER-Seq.cpp
96142
polybench/POLYBENCH_GEMVER-OMPTarget.cpp
97143
polybench/POLYBENCH_GESUMMV.cpp
144+
polybench/POLYBENCH_GESUMMV-Seq.cpp
98145
polybench/POLYBENCH_GESUMMV-OMPTarget.cpp
99146
polybench/POLYBENCH_HEAT_3D.cpp
147+
polybench/POLYBENCH_HEAT_3D-Seq.cpp
100148
polybench/POLYBENCH_HEAT_3D-OMPTarget.cpp
101149
polybench/POLYBENCH_JACOBI_1D.cpp
150+
polybench/POLYBENCH_JACOBI_1D-Seq.cpp
102151
polybench/POLYBENCH_JACOBI_1D-OMPTarget.cpp
103152
polybench/POLYBENCH_JACOBI_2D.cpp
153+
polybench/POLYBENCH_JACOBI_2D-Seq.cpp
104154
polybench/POLYBENCH_JACOBI_2D-OMPTarget.cpp
105155
polybench/POLYBENCH_MVT.cpp
156+
polybench/POLYBENCH_MVT-Seq.cpp
106157
polybench/POLYBENCH_MVT-OMPTarget.cpp
107158
stream/ADD.cpp
159+
stream/ADD-Seq.cpp
108160
stream/ADD-OMPTarget.cpp
109161
stream/COPY.cpp
162+
stream/COPY-Seq.cpp
110163
stream/COPY-OMPTarget.cpp
111164
stream/DOT.cpp
165+
stream/DOT-Seq.cpp
112166
stream/DOT-OMPTarget.cpp
113167
stream/MUL.cpp
168+
stream/MUL-Seq.cpp
114169
stream/MUL-OMPTarget.cpp
115170
stream/TRIAD.cpp
171+
stream/TRIAD-Seq.cpp
116172
stream/TRIAD-OMPTarget.cpp
117173
common/DataUtils.cpp
118174
common/Executor.cpp

src/apps/CMakeLists.txt

+18-4
Original file line numberDiff line numberDiff line change
@@ -9,27 +9,41 @@
99
blt_add_library(
1010
NAME apps
1111
SOURCES AppsData.cpp
12-
WIP-COUPLE.cpp
1312
DEL_DOT_VEC_2D.cpp
13+
DEL_DOT_VEC_2D-Seq.cpp
1414
DEL_DOT_VEC_2D-Cuda.cpp
15+
DEL_DOT_VEC_2D-OMP.cpp
1516
DEL_DOT_VEC_2D-OMPTarget.cpp
16-
ENERGY.cpp
17+
ENERGY.cpp
18+
ENERGY-Seq.cpp
1719
ENERGY-Cuda.cpp
20+
ENERGY-OMP.cpp
1821
ENERGY-OMPTarget.cpp
1922
FIR.cpp
23+
FIR-Seq.cpp
2024
FIR-Cuda.cpp
25+
FIR-OMP.cpp
2126
FIR-OMPTarget.cpp
2227
LTIMES.cpp
28+
LTIMES-Seq.cpp
2329
LTIMES-Cuda.cpp
30+
LTIMES-OMP.cpp
2431
LTIMES-OMPTarget.cpp
2532
LTIMES_NOVIEW.cpp
33+
LTIMES_NOVIEW-Seq.cpp
2634
LTIMES_NOVIEW-Cuda.cpp
35+
LTIMES_NOVIEW-OMP.cpp
2736
LTIMES_NOVIEW-OMPTarget.cpp
2837
PRESSURE.cpp
38+
PRESSURE-Seq.cpp
2939
PRESSURE-Cuda.cpp
30-
PRESSURE-OMPTarget.cpp
31-
VOL3D.cpp
40+
PRESSURE-OMP.cpp
41+
PRESSURE-OMP.cpp
42+
VOL3D.cpp
43+
VOL3D-Seq.cpp
3244
VOL3D-Cuda.cpp
45+
VOL3D-OMP.cpp
3346
VOL3D-OMPTarget.cpp
47+
WIP-COUPLE.cpp
3448
DEPENDS_ON common ${RAJA_PERFSUITE_DEPENDS}
3549
)

0 commit comments

Comments
 (0)