From e5070fca579ca8e667e3cf099a8a6c66f04dca67 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Fri, 20 Sep 2024 16:25:57 +0100 Subject: [PATCH 01/32] Rename to archer2 tutorial so we can add others --- README.md | 2 +- benchmarks/examples/stream/stream.py | 2 +- docs/tutorial/{tutorial.md => archer2_tutorial.md} | 0 mkdocs.yml | 2 +- 4 files changed, 3 insertions(+), 3 deletions(-) rename docs/tutorial/{tutorial.md => archer2_tutorial.md} (100%) diff --git a/README.md b/README.md index 39212b65..74251f46 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@ _**Note**: at the moment the ExCALIBUR benchmarks are a work-in-progress._ - [Contributing](https://ukri-excalibur.github.io/excalibur-tests/contributing/) - [Supported benchmarks](https://ukri-excalibur.github.io/excalibur-tests/apps/) - [Supported systems](https://ukri-excalibur.github.io/excalibur-tests/systems/) -- [ARCHER2 tutorial](https://ukri-excalibur.github.io/excalibur-tests/tutorial/tutorial/) +- [ARCHER2 tutorial](https://ukri-excalibur.github.io/excalibur-tests/tutorial/archer2_tutorial/) ## Acknowledgements diff --git a/benchmarks/examples/stream/stream.py b/benchmarks/examples/stream/stream.py index 4786e5e7..1a7bf999 100644 --- a/benchmarks/examples/stream/stream.py +++ b/benchmarks/examples/stream/stream.py @@ -1,5 +1,5 @@ # Demo class for running the stream benchmark -# Used for tutorial +# Used for archer2 tutorial # Import modules from reframe and excalibur-tests import reframe as rfm diff --git a/docs/tutorial/tutorial.md b/docs/tutorial/archer2_tutorial.md similarity index 100% rename from docs/tutorial/tutorial.md rename to docs/tutorial/archer2_tutorial.md diff --git a/mkdocs.yml b/mkdocs.yml index 58da74be..ac122cd0 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -40,7 +40,7 @@ nav: - Myriad: systems#myriad-and-kathleen - Kathleen: systems#myriad-and-kathleen - Tursa: systems#tursa - - ARCHER2 Tutorial: tutorial/tutorial.md + - ARCHER2 Tutorial: tutorial/archer2_tutorial.md theme: name: material features: From fd9a277ac71a093472f086ac58dab55051ec09ba Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Fri, 20 Sep 2024 16:28:03 +0100 Subject: [PATCH 02/32] Add raw markdown from durham tutorial --- docs/tutorial/durham_reframe_tutorial.md | 436 +++++++++++++++++++++++ 1 file changed, 436 insertions(+) create mode 100644 docs/tutorial/durham_reframe_tutorial.md diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/durham_reframe_tutorial.md new file mode 100644 index 00000000..22a24758 --- /dev/null +++ b/docs/tutorial/durham_reframe_tutorial.md @@ -0,0 +1,436 @@ +--- +title: Durham ReFrame Tutorial + +--- + +--- +#type: slide +--- + + + +# ReFrame tutorial notes + +## Outline + +1. How ReFrame executes tests +2. Structure of a ReFrame test -- Hello world example +3. Configuring ReFrame to run tests on Cosma +4. Writing performance tests -- Stream example +5. Working with build systems -- Make, CMake, Autotools, Spack examples +6. Avoiding build systems -- Run-only tests + + +Adapted from [ReFrame 4.5 tutorials](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorials.html) that cover more features. +There's a [new tutorial](https://reframe-hpc.readthedocs.io/en/v4.6.0/tutorial.html) in ReFrame 4.6 + +--- + +## How ReFrame executes tests + +When ReFrame executes a test it runs a pipeline of the following stages + +![](https://reframe-hpc.readthedocs.io/en/stable/_images/pipeline.svg) + +You can customise the behaviour of each stage or add a hook before or after each of them. For more details, read the [ReFrame pipeline documentation](https://reframe-hpc.readthedocs.io/en/stable/pipeline.html). + +--- + +## Set up environment +Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. First load a newer python module. + +```bash +module swap python/3.10.12 +``` + +Then create an environment and activate it with + +```bash +python3 -m venv reframe_tutorial +source reframe_tutorial/bin/activate +``` + +You will have to activate the environment each time you login. To deactivate the environment run `deactivate`. + +---- + +## Install ReFrame + +Then install ReFrame with `pip`. I am installing version `4.5.2` because we will follow tutorials that have been changed in the latest `4.6.0` version. + +```bash +pip install reframe-hpc==4.5.2 +``` + +Alternatively, you can + +```bash +git clone -q --depth 1 --branch v4.5.2 https://github.com/reframe-hpc/reframe.git +source reframe/bootstrap.sh +``` + +--- + +## Hello world example + +[Hello world example](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_basics.html#the-hello-world-test) + +---- + +### Include ReFrame modules + +The first thing you need is include a few modules from ReFrame. These should be available if the installation step was successful. + +```python +import reframe as rfm +import reframe.utility.sanity as sn +``` + +---- + +### Create a Test Class + +- ReFrame uses decorators to mark classes as tests. +- This marks `class HelloTest` as a `rfm.simple_test`. +- ReFrame tests ultimately derive from `RegressionTest`. There are others derived classes such as `RunOnlyRegressionTest`, we get to those later. + +```python + +@rfm.simple_test +class HelloTest(rfm.RegressionTest): +``` + +- The data members and methods detailed in the following sections should be placed inside this class. + +---- + +### Add mandatory attributes + +- `valid_systems` for where this test can run +- `valid_prog_environs` for what compilers this test can build with. More on it later. +- `sourcepath` for source file in a single source test. More on build systems later. +- Could add `sourcesdir` but it defaults to `src/` + +```python + valid_systems = ['*'] + valid_prog_environs = ['*'] + sourcepath = 'hello.c' +``` + +---- + +### Add sanity check + +- ReFrame, by default, makes no assumption about whether a test is successful or not. +- A test must provide a validation function +- ReFrame provides a rich set of utility functions that help matching patterns and extract values from the test’s output +- Here we match a string from stdout + +```python + @sanity_function + def assert_hello(self): + return sn.assert_found(r'Hello, World\!', self.stdout) +``` + +---- + +## Builting programming environment + +- `reframe --show-config` +- Builtin programming environment uses `cc` to compile + +--- + +## Configuring ReFrame for HPC systems +> In ReFrame, all the details of the various interactions of a test with the system environment are handled transparently and are set up in its configuration file. +- [Configuration](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_basics.html#more-of-hello-world) + - Set accounting parameters with + - `'access': ['--partition=bluefield1', '--account=do009'],` + - Create at least one programming environment to set compilers + - `-p` flag filters tests by programming environment + - Scheduler to run on compute nodes + - Add `time_limit = 1m` to ReFrame tests to run on Cosma + - Set from command line with `-S time_limit='1m'` + +---- + +:::spoiler Basic configuration for Cosma +```python +site_configuration = { + 'systems' : [ + { + 'name': 'cosma', + 'descr': 'Cosma for performance workshop', + 'hostnames': ['login[0-9][a-z].pri.cosma[0-9].alces.network'], + 'modules_system': 'tmod4', + 'partitions': [ + { + 'name': 'compute-node', + 'scheduler': 'slurm', + 'launcher': 'mpiexec', + 'environs': ['gnu'], + 'access': ['--partition=bluefield1', '--account=do009'], + } + ] + } + ], + 'environments': [ + { + 'name': 'gnu', + 'modules': ['gnu_comp', 'openmpi'], + 'cc': 'mpicc', + 'cxx': 'mpic++', + 'ftn': 'mpif90' + }, +# { +# 'name': 'intel', +# 'modules': ['intel_comp', 'intel_mpi'], +# 'cc': 'mpiicc', +# 'cxx': 'mpiicpc', +# 'ftn': 'mpiifort' +# }, + ] +} + +``` +::: + +--- + +## Performance tests + +Performance tests capture data in performance variables. For simplicity, we use the [STREAM benchmark](https://github.com/jeffhammond/STREAM) as an example. It is the de facto memory bandwidth benchmark. + +To record the performance of the benchmark, ReFrame should extract a figure of merit from the output of the test. A function decorated with the `@performance_function` decorator extracts or computes a performance metric from the test’s output. + +---- + +### Boilerplate + +Same as before. We can now specify valid systems and prog environments + +```python +import reframe as rfm +import reframe.utility.sanity as sn + +@rfm.simple_test +class StreamTest(rfm.RegressionTest): + valid_systems = ['cosma'] + valid_prog_environs = ['gnu', 'intel'] +``` + +---- + +### Git Cloning the source + +we can retrieve specifically a Git repository by assigning its URL directly to the sourcesdir attribute: + +```python + sourcesdir='https://github.com/jeffhammond/STREAM' +``` + +---- + +### Environment variables + +We can set environment variables by defining the `env_vars` attribute + +```python + env_vars = { + 'OMP_NUM_THREADS': '4', + 'OMP_PLACES': 'cores' + } +``` + +---- + +### Building + +- Remember the pipeline ReFrame executes. We can run arbitrary functions in the pipeline by decorating them with `@run_before` or `@run_after` +- Here we can insert compiler flags before compiling + +```python + build_system='SingleSource' + sourcepath='stream.c' + arraysize = 2**20 + + @run_before('compile') + def set_compiler_flags(self): + self.build_system.cppflags = [f'-DSTREAM_ARRAY_SIZE={self.arraysize}'] + self.build_system.cflags = ['-fopenmp', '-O3'] +``` + +---- + +### Sanity function + +Similar to before, we can check a line in stdout for validation. + +```python + @sanity_function + def validate_solution(self): + return sn.assert_found(r'Solution Validates', self.stdout) +``` + +---- + +## Add Performance Pattern Check + +To record the performance of the benchmark, ReFrame should extract a figure of merit from the output of the test. A function decorated with the `@performance_function` decorator extracts or computes a performance metric from the test’s output. + +> In this example, we extract four performance variables, namely the memory bandwidth values for each of the “Copy”, “Scale”, “Add” and “Triad” sub-benchmarks of STREAM, where each of the performance functions use the [`extractsingle()`](https://reframe-hpc.readthedocs.io/en/latest/deferrable_functions_reference.html#reframe.utility.sanity.extractsingle) utility function. For each of the sub-benchmarks we extract the “Best Rate MB/s” column of the output (see below) and we convert that to a float. + +---- + +## Performance Pattern Check + +```python +@performance_function('MB/s', perf_key='Copy') +def extract_copy_perf(self): + return sn.extractsingle(r'Copy:\s+(\S+)\s+.*', self.stdout, 1, float) + +@performance_function('MB/s', perf_key='Scale') +def extract_scale_perf(self): + return sn.extractsingle(r'Scale:\s+(\S+)\s+.*', self.stdout, 1, float) + +@performance_function('MB/s', perf_key='Add') +def extract_add_perf(self): + return sn.extractsingle(r'Add:\s+(\S+)\s+.*', self.stdout, 1, float) + +@performance_function('MB/s', perf_key='Triad') +def extract_triad_perf(self): + return sn.extractsingle(r'Triad:\s+(\S+)\s+.*', self.stdout, 1, float) +``` + +---- + +### Perflogs + +- Perflogs are output in `perflogs//` +- By default a lot of information is printed. This can be customized in the configuration file. More on this later. +- By default not much information about build step, has to be linked back to build environment +- See `.reframe/reports/` or use `--report-file` + +---- + +## Reference values + +ReFrame can automate checking that the results fall within an expected range. You can set a different reference value for each `perf_key` in the performance function. For example, set the test to fail if it falls outside of +-25% of the values obtained with the previous array size. + +```python +reference = { + 'cosma': { + 'Copy': (40000, -0.25, 0.25, 'MB/s'), + 'Scale': (20000, -0.25, 0.25, 'MB/s'), + 'Add': (20000, -0.25, 0.25, 'MB/s'), + 'Triad': (20000, -0.25, 0.25, 'MB/s') + } +} +``` + +> The performance reference tuple consists of the reference value, the lower and upper thresholds expressed as fractional numbers relative to the reference value, and the unit of measurement. If any of the thresholds is not relevant, None may be used instead. Also, the units in this reference variable are entirely optional, since they were already provided through the @performance_function decorator. + + +---- + +### Parametrized tests + +You can pass a list to the `parameter()` built-in function in the class body to create a parametrized test. You cannot access the individual parameter value within the class body, so any reference to them should be placed in the appropriate function, for example `__init__()` + +For parametrisation you can add for example +```python + arraysize = parameter([5,15,25]) + self.build_system.cppflags = [f'-DSTREAM_ARRAY_SIZE=$((2 ** {self.arraysize}))'] +``` + +You can have multiple parameters. ReFrame will run all parameter combinations by default. + +--- + +## [Build systems](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_advanced.html#more-on-building-tests) +- [Build systems Reference](https://reframe-hpc.readthedocs.io/en/v4.5.2/regression_test_api.html#build-systems) + +---- + +## [Make](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_advanced.html#more-on-building-tests) + +- Tutorial in `tutorials/advanced/makefiles/maketest.py`. + +> you must set the executable attribute of the test, because ReFrame cannot know what is the actual executable to be run. We then set the build system to Make and set the preprocessor flags as we would do with the SingleSource build system. + + +---- + +## [Autotools](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_advanced.html#adding-a-configuration-step-before-compiling-the-code) +> It is often the case that a configuration step is needed before compiling a code with make. To address this kind of projects, ReFrame aims to offer specific abstractions for “configure-make” style of build systems. It supports CMake-based projects through the CMake build system, as well as Autotools-based projects through the Autotools build system. + + +- [Automake Hello example](https://github.com/ntegan/amhello) + +:::spoiler AutoHelloTest class +```python +import reframe as rfm +import reframe.utility.sanity as sn + +@rfm.simple_test +class AutoHelloTest(rfm.RegressionTest): + valid_systems = ['*'] + valid_prog_environs = ['*'] + sourcesdir = 'https://github.com/ntegan/amhello.git' + build_system = 'Autotools' + executable = './src/hello' + prebuild_cmds = ['autoreconf --install .'] + time_limit = '5m' + + @sanity_function + def assert_hello(self): + return sn.assert_found(r'Hello world\!', self.stdout) +``` +::: + +---- + +## [CMake](https://reframe-hpc.readthedocs.io/en/v4.5.2/regression_test_api.html#reframe.core.buildsystems.CMake) +- [CMake Hello example](https://github.com/jameskbride/cmake-hello-world) + +:::spoiler CMakeHelloTest class +```python +import reframe as rfm +import reframe.utility.sanity as sn + + +@rfm.simple_test +class CMakeHelloTest(rfm.RegressionTest): + valid_systems = ['*'] + valid_prog_environs = ['*'] + sourcesdir = 'https://github.com/jameskbride/cmake-hello-world.git' + build_system = 'CMake' + executable = './CMakeHelloWorld' + time_limit = '5m' + + @sanity_function + def assert_hello(self): + return sn.assert_found(r'Hello, world\!', self.stdout) + +``` +::: + +---- + +## [Spack](https://reframe-hpc.readthedocs.io/en/v4.5.2/regression_test_api.html#reframe.core.buildsystems.Spack) + +- ReFrame will use a user-provided Spack environment in order to build and test a set of specs. +- Tutorial in `tutorials/build_systems/spack/spack_test.py` +- In `rfm_job.out` you can see that it + - Creates a blank environment + - Builds all dependencies -- takes quite long +- [ ] Try building bzip2 with excalibur cosma environment + +--- + +### Run-only tests +- Tutorial in `tutorials/advanced/runonly/echorand.py` From 50466d7ff4e4948185b7736659b140a0e0dda927 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Fri, 20 Sep 2024 16:36:00 +0100 Subject: [PATCH 03/32] Add reframe tutorial to navigation --- mkdocs.yml | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mkdocs.yml b/mkdocs.yml index ac122cd0..b4ca0e2c 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -40,7 +40,9 @@ nav: - Myriad: systems#myriad-and-kathleen - Kathleen: systems#myriad-and-kathleen - Tursa: systems#tursa - - ARCHER2 Tutorial: tutorial/archer2_tutorial.md + - 'Tutorials': + - ARCHER2 Tutorial: tutorial/archer2_tutorial.md + - ReFrame Tutorial: tutorial/durham_reframe_tutorial.md theme: name: material features: From 402833c749d524ee65878df7e205edd628351f84 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Fri, 20 Sep 2024 16:44:30 +0100 Subject: [PATCH 04/32] Remove hackmd formatting --- docs/tutorial/durham_reframe_tutorial.md | 23 ----------------------- 1 file changed, 23 deletions(-) diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/durham_reframe_tutorial.md index 22a24758..c4fe1a05 100644 --- a/docs/tutorial/durham_reframe_tutorial.md +++ b/docs/tutorial/durham_reframe_tutorial.md @@ -1,20 +1,3 @@ ---- -title: Durham ReFrame Tutorial - ---- - ---- -#type: slide ---- - - - -# ReFrame tutorial notes - ## Outline 1. How ReFrame executes tests @@ -158,7 +141,6 @@ class HelloTest(rfm.RegressionTest): ---- -:::spoiler Basic configuration for Cosma ```python site_configuration = { 'systems' : [ @@ -197,7 +179,6 @@ site_configuration = { } ``` -::: --- @@ -371,7 +352,6 @@ You can have multiple parameters. ReFrame will run all parameter combinations by - [Automake Hello example](https://github.com/ntegan/amhello) -:::spoiler AutoHelloTest class ```python import reframe as rfm import reframe.utility.sanity as sn @@ -390,14 +370,12 @@ class AutoHelloTest(rfm.RegressionTest): def assert_hello(self): return sn.assert_found(r'Hello world\!', self.stdout) ``` -::: ---- ## [CMake](https://reframe-hpc.readthedocs.io/en/v4.5.2/regression_test_api.html#reframe.core.buildsystems.CMake) - [CMake Hello example](https://github.com/jameskbride/cmake-hello-world) -:::spoiler CMakeHelloTest class ```python import reframe as rfm import reframe.utility.sanity as sn @@ -417,7 +395,6 @@ class CMakeHelloTest(rfm.RegressionTest): return sn.assert_found(r'Hello, world\!', self.stdout) ``` -::: ---- From 83f650e29e27ab757762cc8663e747efd574d6c2 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Fri, 20 Sep 2024 16:44:44 +0100 Subject: [PATCH 05/32] Update tutorial links --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 74251f46..09e3e8ad 100644 --- a/README.md +++ b/README.md @@ -22,6 +22,7 @@ _**Note**: at the moment the ExCALIBUR benchmarks are a work-in-progress._ - [Supported benchmarks](https://ukri-excalibur.github.io/excalibur-tests/apps/) - [Supported systems](https://ukri-excalibur.github.io/excalibur-tests/systems/) - [ARCHER2 tutorial](https://ukri-excalibur.github.io/excalibur-tests/tutorial/archer2_tutorial/) +- [ReFrame tutorial](https://ukri-excalibur.github.io/excalibur-tests/tutorial/durham_reframe_tutorial/) ## Acknowledgements From db85a8a108bb876ccdde9eeaa76ce0f20ba4ce5f Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Fri, 20 Sep 2024 16:57:41 +0100 Subject: [PATCH 06/32] Add some comments --- docs/tutorial/durham_reframe_tutorial.md | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/durham_reframe_tutorial.md index c4fe1a05..f3131f3d 100644 --- a/docs/tutorial/durham_reframe_tutorial.md +++ b/docs/tutorial/durham_reframe_tutorial.md @@ -1,3 +1,5 @@ +# Automating benchmarks with ReFrame + ## Outline 1. How ReFrame executes tests @@ -7,7 +9,6 @@ 5. Working with build systems -- Make, CMake, Autotools, Spack examples 6. Avoiding build systems -- Run-only tests - Adapted from [ReFrame 4.5 tutorials](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorials.html) that cover more features. There's a [new tutorial](https://reframe-hpc.readthedocs.io/en/v4.6.0/tutorial.html) in ReFrame 4.6 @@ -24,6 +25,8 @@ You can customise the behaviour of each stage or add a hook before or after each --- ## Set up environment + +This tutorial was originally run on the [Cosma](https://cosma.readthedocs.io/en/latest/) supercomputer. It should be straightforward to run on a different platform, the requirements are `gcc`, `git` and `python3`. (for the later parts you also need `make`, `autotools`, `cmake` and `spack`). Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. First load a newer python module. ```bash @@ -128,7 +131,7 @@ class HelloTest(rfm.RegressionTest): --- -## Configuring ReFrame for HPC systems +## Configuring ReFrame for HPC systems (Cosma) > In ReFrame, all the details of the various interactions of a test with the system environment are handled transparently and are set up in its configuration file. - [Configuration](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_basics.html#more-of-hello-world) - Set accounting parameters with @@ -168,13 +171,6 @@ site_configuration = { 'cxx': 'mpic++', 'ftn': 'mpif90' }, -# { -# 'name': 'intel', -# 'modules': ['intel_comp', 'intel_mpi'], -# 'cc': 'mpiicc', -# 'cxx': 'mpiicpc', -# 'ftn': 'mpiifort' -# }, ] } @@ -405,9 +401,9 @@ class CMakeHelloTest(rfm.RegressionTest): - In `rfm_job.out` you can see that it - Creates a blank environment - Builds all dependencies -- takes quite long -- [ ] Try building bzip2 with excalibur cosma environment +- `excalibur-tests` provides utilities and settings for Spack builds in ReFrame. See the [Next Tutorial](../archer2_tutorial) for details. --- -### Run-only tests +## Run-only tests - Tutorial in `tutorials/advanced/runonly/echorand.py` From edf276dbe6ea5586613a223d8cff7664ec92bfd2 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Fri, 20 Sep 2024 16:58:27 +0100 Subject: [PATCH 07/32] Fix headings --- docs/tutorial/durham_reframe_tutorial.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/durham_reframe_tutorial.md index f3131f3d..f4056dc5 100644 --- a/docs/tutorial/durham_reframe_tutorial.md +++ b/docs/tutorial/durham_reframe_tutorial.md @@ -333,7 +333,7 @@ You can have multiple parameters. ReFrame will run all parameter combinations by ---- -## [Make](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_advanced.html#more-on-building-tests) +### [Make](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_advanced.html#more-on-building-tests) - Tutorial in `tutorials/advanced/makefiles/maketest.py`. @@ -342,7 +342,7 @@ You can have multiple parameters. ReFrame will run all parameter combinations by ---- -## [Autotools](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_advanced.html#adding-a-configuration-step-before-compiling-the-code) +### [Autotools](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_advanced.html#adding-a-configuration-step-before-compiling-the-code) > It is often the case that a configuration step is needed before compiling a code with make. To address this kind of projects, ReFrame aims to offer specific abstractions for “configure-make” style of build systems. It supports CMake-based projects through the CMake build system, as well as Autotools-based projects through the Autotools build system. @@ -369,7 +369,7 @@ class AutoHelloTest(rfm.RegressionTest): ---- -## [CMake](https://reframe-hpc.readthedocs.io/en/v4.5.2/regression_test_api.html#reframe.core.buildsystems.CMake) +### [CMake](https://reframe-hpc.readthedocs.io/en/v4.5.2/regression_test_api.html#reframe.core.buildsystems.CMake) - [CMake Hello example](https://github.com/jameskbride/cmake-hello-world) ```python @@ -394,7 +394,7 @@ class CMakeHelloTest(rfm.RegressionTest): ---- -## [Spack](https://reframe-hpc.readthedocs.io/en/v4.5.2/regression_test_api.html#reframe.core.buildsystems.Spack) +### [Spack](https://reframe-hpc.readthedocs.io/en/v4.5.2/regression_test_api.html#reframe.core.buildsystems.Spack) - ReFrame will use a user-provided Spack environment in order to build and test a set of specs. - Tutorial in `tutorials/build_systems/spack/spack_test.py` From e730c23338aeeea96b052a237a1e3635288adc37 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Fri, 20 Sep 2024 17:07:29 +0100 Subject: [PATCH 08/32] It's more logical to start with the ReFrame tutorial --- README.md | 2 +- mkdocs.yml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 09e3e8ad..f3440e4a 100644 --- a/README.md +++ b/README.md @@ -21,8 +21,8 @@ _**Note**: at the moment the ExCALIBUR benchmarks are a work-in-progress._ - [Contributing](https://ukri-excalibur.github.io/excalibur-tests/contributing/) - [Supported benchmarks](https://ukri-excalibur.github.io/excalibur-tests/apps/) - [Supported systems](https://ukri-excalibur.github.io/excalibur-tests/systems/) -- [ARCHER2 tutorial](https://ukri-excalibur.github.io/excalibur-tests/tutorial/archer2_tutorial/) - [ReFrame tutorial](https://ukri-excalibur.github.io/excalibur-tests/tutorial/durham_reframe_tutorial/) +- [ARCHER2 tutorial](https://ukri-excalibur.github.io/excalibur-tests/tutorial/archer2_tutorial/) ## Acknowledgements diff --git a/mkdocs.yml b/mkdocs.yml index b4ca0e2c..43871ca5 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -41,8 +41,8 @@ nav: - Kathleen: systems#myriad-and-kathleen - Tursa: systems#tursa - 'Tutorials': - - ARCHER2 Tutorial: tutorial/archer2_tutorial.md - ReFrame Tutorial: tutorial/durham_reframe_tutorial.md + - ARCHER2 Tutorial: tutorial/archer2_tutorial.md theme: name: material features: From 7f961b871312b3ed9e35edaa485729c28ed18043 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Mon, 23 Sep 2024 11:15:43 +0100 Subject: [PATCH 09/32] Update docs/tutorial/durham_reframe_tutorial.md Co-authored-by: Ilektra Christidi --- docs/tutorial/durham_reframe_tutorial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/durham_reframe_tutorial.md index f4056dc5..bc8b5f80 100644 --- a/docs/tutorial/durham_reframe_tutorial.md +++ b/docs/tutorial/durham_reframe_tutorial.md @@ -82,7 +82,7 @@ import reframe.utility.sanity as sn - ReFrame uses decorators to mark classes as tests. - This marks `class HelloTest` as a `rfm.simple_test`. -- ReFrame tests ultimately derive from `RegressionTest`. There are others derived classes such as `RunOnlyRegressionTest`, we get to those later. +- ReFrame tests ultimately derive from `RegressionTest`. There are other derived classes such as `RunOnlyRegressionTest`, we get to those later. ```python From 8ed358ef90a8758d97f14ea0d6f400800ddebb7b Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Mon, 23 Sep 2024 11:17:44 +0100 Subject: [PATCH 10/32] Apply suggestions from code review Co-authored-by: Ilektra Christidi --- docs/tutorial/durham_reframe_tutorial.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/durham_reframe_tutorial.md index bc8b5f80..76525cc5 100644 --- a/docs/tutorial/durham_reframe_tutorial.md +++ b/docs/tutorial/durham_reframe_tutorial.md @@ -124,7 +124,7 @@ class HelloTest(rfm.RegressionTest): ---- -## Builting programming environment +## Builtin programming environment - `reframe --show-config` - Builtin programming environment uses `cc` to compile @@ -314,7 +314,7 @@ reference = { ---- -### Parametrized tests +## Parametrized tests You can pass a list to the `parameter()` built-in function in the class body to create a parametrized test. You cannot access the individual parameter value within the class body, so any reference to them should be placed in the appropriate function, for example `__init__()` From 6cf5053bbcccc07069ca70e7d588ec1614c75161 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Mon, 23 Sep 2024 11:36:59 +0100 Subject: [PATCH 11/32] Address review commits --- docs/tutorial/durham_reframe_tutorial.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/durham_reframe_tutorial.md index f4056dc5..456e2c5d 100644 --- a/docs/tutorial/durham_reframe_tutorial.md +++ b/docs/tutorial/durham_reframe_tutorial.md @@ -9,8 +9,11 @@ 5. Working with build systems -- Make, CMake, Autotools, Spack examples 6. Avoiding build systems -- Run-only tests -Adapted from [ReFrame 4.5 tutorials](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorials.html) that cover more features. -There's a [new tutorial](https://reframe-hpc.readthedocs.io/en/v4.6.0/tutorial.html) in ReFrame 4.6 +This tutorial is adapted from [ReFrame 4.5 tutorials](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorials.html), that also cover more ReFrame features. Direct quotes from the tutorial are marked with + +> ReFrame Tutorials + +There's a [new tutorial](https://reframe-hpc.readthedocs.io/en/v4.6.0/tutorial.html) with a slightly different approach in ReFrame 4.6. --- @@ -59,6 +62,8 @@ git clone -q --depth 1 --branch v4.5.2 https://github.com/reframe-hpc/reframe.gi source reframe/bootstrap.sh ``` +The ReFrame git repository also contains the source code of the ReFrame tutorials. It is recommended to run the git clone step, even if you used `pip install` to install ReFrame. We will refer to the tutorial solutions later. + --- ## Hello world example @@ -337,15 +342,13 @@ You can have multiple parameters. ReFrame will run all parameter combinations by - Tutorial in `tutorials/advanced/makefiles/maketest.py`. -> you must set the executable attribute of the test, because ReFrame cannot know what is the actual executable to be run. We then set the build system to Make and set the preprocessor flags as we would do with the SingleSource build system. - +> First, if you’re using any build system other than SingleSource, you must set the executable attribute of the test, because ReFrame cannot know what is the actual executable to be run. We then set the build system to Make and set the preprocessor flags as we would do with the SingleSource build system. ---- ### [Autotools](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_advanced.html#adding-a-configuration-step-before-compiling-the-code) > It is often the case that a configuration step is needed before compiling a code with make. To address this kind of projects, ReFrame aims to offer specific abstractions for “configure-make” style of build systems. It supports CMake-based projects through the CMake build system, as well as Autotools-based projects through the Autotools build system. - - [Automake Hello example](https://github.com/ntegan/amhello) ```python From 99b24004168a0a528bd86c0317ba92ec7865236c Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Mon, 23 Sep 2024 11:38:41 +0100 Subject: [PATCH 12/32] Try adding another space --- docs/tutorial/durham_reframe_tutorial.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/durham_reframe_tutorial.md index 1e055468..76c680c3 100644 --- a/docs/tutorial/durham_reframe_tutorial.md +++ b/docs/tutorial/durham_reframe_tutorial.md @@ -402,8 +402,8 @@ class CMakeHelloTest(rfm.RegressionTest): - ReFrame will use a user-provided Spack environment in order to build and test a set of specs. - Tutorial in `tutorials/build_systems/spack/spack_test.py` - In `rfm_job.out` you can see that it - - Creates a blank environment - - Builds all dependencies -- takes quite long + - Creates a blank environment + - Builds all dependencies -- takes quite long - `excalibur-tests` provides utilities and settings for Spack builds in ReFrame. See the [Next Tutorial](../archer2_tutorial) for details. --- From 869f8e8d539c8aa617acf168259f834e512ad72d Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Mon, 23 Sep 2024 11:41:18 +0100 Subject: [PATCH 13/32] One more space --- docs/tutorial/durham_reframe_tutorial.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/durham_reframe_tutorial.md index 76c680c3..4d22c718 100644 --- a/docs/tutorial/durham_reframe_tutorial.md +++ b/docs/tutorial/durham_reframe_tutorial.md @@ -402,8 +402,8 @@ class CMakeHelloTest(rfm.RegressionTest): - ReFrame will use a user-provided Spack environment in order to build and test a set of specs. - Tutorial in `tutorials/build_systems/spack/spack_test.py` - In `rfm_job.out` you can see that it - - Creates a blank environment - - Builds all dependencies -- takes quite long + - Creates a blank environment + - Builds all dependencies -- takes quite long - `excalibur-tests` provides utilities and settings for Spack builds in ReFrame. See the [Next Tutorial](../archer2_tutorial) for details. --- From d669be92059f303d54cec583f36d6a48a40de5e0 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Mon, 23 Sep 2024 11:47:56 +0100 Subject: [PATCH 14/32] Remove duplicate and add note about platform specific stuff --- docs/tutorial/durham_reframe_tutorial.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/durham_reframe_tutorial.md index 4d22c718..39a44846 100644 --- a/docs/tutorial/durham_reframe_tutorial.md +++ b/docs/tutorial/durham_reframe_tutorial.md @@ -187,13 +187,11 @@ site_configuration = { Performance tests capture data in performance variables. For simplicity, we use the [STREAM benchmark](https://github.com/jeffhammond/STREAM) as an example. It is the de facto memory bandwidth benchmark. -To record the performance of the benchmark, ReFrame should extract a figure of merit from the output of the test. A function decorated with the `@performance_function` decorator extracts or computes a performance metric from the test’s output. - ---- ### Boilerplate -Same as before. We can now specify valid systems and prog environments +Same as before. We can now specify valid systems and prog environments on Cosma. You can adapt these to your platform, or use `'*'` to run on any platform. ```python import reframe as rfm From 4ef37cc636167e750ec98fa2ac0e286153b6357d Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Mon, 23 Sep 2024 11:53:14 +0100 Subject: [PATCH 15/32] Rewrite part on perflogs, add link to excalibur-tests. --- docs/tutorial/durham_reframe_tutorial.md | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/durham_reframe_tutorial.md index 39a44846..2dd72467 100644 --- a/docs/tutorial/durham_reframe_tutorial.md +++ b/docs/tutorial/durham_reframe_tutorial.md @@ -264,10 +264,6 @@ To record the performance of the benchmark, ReFrame should extract a figure of m > In this example, we extract four performance variables, namely the memory bandwidth values for each of the “Copy”, “Scale”, “Add” and “Triad” sub-benchmarks of STREAM, where each of the performance functions use the [`extractsingle()`](https://reframe-hpc.readthedocs.io/en/latest/deferrable_functions_reference.html#reframe.utility.sanity.extractsingle) utility function. For each of the sub-benchmarks we extract the “Best Rate MB/s” column of the output (see below) and we convert that to a float. ----- - -## Performance Pattern Check - ```python @performance_function('MB/s', perf_key='Copy') def extract_copy_perf(self): @@ -290,10 +286,10 @@ def extract_triad_perf(self): ### Perflogs -- Perflogs are output in `perflogs//` -- By default a lot of information is printed. This can be customized in the configuration file. More on this later. -- By default not much information about build step, has to be linked back to build environment -- See `.reframe/reports/` or use `--report-file` +The output from performance tests is written in perflogs. They are csv files that are appended each time a test is ran. By default the perflogs are output in `perflogs//`. By default a lot of information about the test is stored. This can be customized in the configuration file. +By default there is not much information about build step, but ReFrame will provide a link back to build environment. A more verbose report is written in `.reframe/reports/`, you can use the `--report-file` option to direct the report to a different file. + +`excalibur-tests` provides tools to read and process the perflogs. See the [Next Tutorial](../archer2_tutorial) for details. ---- From 0fd3565034413c703c2443951ca33404c127b536 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Mon, 23 Sep 2024 11:57:36 +0100 Subject: [PATCH 16/32] Add note on how to run --- docs/tutorial/durham_reframe_tutorial.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/durham_reframe_tutorial.md index 2dd72467..77b679c6 100644 --- a/docs/tutorial/durham_reframe_tutorial.md +++ b/docs/tutorial/durham_reframe_tutorial.md @@ -127,6 +127,14 @@ class HelloTest(rfm.RegressionTest): return sn.assert_found(r'Hello, World\!', self.stdout) ``` +### Run the benchmark + +The basic syntax to run ReFrame benchmarks is + +```bash +reframe -c path/to/benchmark -r +``` + ---- ## Builtin programming environment From 1b6d7384c7a4fd7ec5b432df5d0297b9b02c8d09 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Mon, 16 Dec 2024 16:21:11 +0000 Subject: [PATCH 17/32] Test tabs --- docs/tutorial/durham_reframe_tutorial.md | 25 +++++++++++++++++++----- mkdocs.yml | 1 + 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/durham_reframe_tutorial.md index 77b679c6..a95aa98e 100644 --- a/docs/tutorial/durham_reframe_tutorial.md +++ b/docs/tutorial/durham_reframe_tutorial.md @@ -29,12 +29,27 @@ You can customise the behaviour of each stage or add a hook before or after each ## Set up environment -This tutorial was originally run on the [Cosma](https://cosma.readthedocs.io/en/latest/) supercomputer. It should be straightforward to run on a different platform, the requirements are `gcc`, `git` and `python3`. (for the later parts you also need `make`, `autotools`, `cmake` and `spack`). -Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. First load a newer python module. +=== "Cosma" -```bash -module swap python/3.10.12 -``` + This tutorial was originally run on the [Cosma](https://cosma.readthedocs.io/en/latest/) supercomputer. + It should be straightforward to run on a different platform, the requirements are `gcc`, `git` and `python3`. (for the later parts you also need `make`, `autotools`, `cmake` and `spack`). + Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. + First load a newer python module. + + ```bash + module swap python/3.10.12 + ``` + +=== "ARCHER2" + + This tutorial is run on ARCHER2, you should have signed up for a training account before starting. + It can be ran on other HPC systems with a batch scheduler but will require making some changes to the config. + Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. + First load the system python module. + + ```bash + module load cray-python + ``` Then create an environment and activate it with diff --git a/mkdocs.yml b/mkdocs.yml index ab66c625..c1ec6b0c 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -70,3 +70,4 @@ markdown_extensions: - pymdownx.inlinehilite - pymdownx.snippets - pymdownx.superfences + - pymdownx.tabbed From e86c2078c8fc1bad63feae56b4de7e06f0004c90 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Mon, 16 Dec 2024 16:42:21 +0000 Subject: [PATCH 18/32] Update with ARCHER2 version --- docs/tutorial/durham_reframe_tutorial.md | 189 +++++++++++++++-------- 1 file changed, 122 insertions(+), 67 deletions(-) diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/durham_reframe_tutorial.md index a95aa98e..e4ac491c 100644 --- a/docs/tutorial/durham_reframe_tutorial.md +++ b/docs/tutorial/durham_reframe_tutorial.md @@ -31,9 +31,9 @@ You can customise the behaviour of each stage or add a hook before or after each === "Cosma" - This tutorial was originally run on the [Cosma](https://cosma.readthedocs.io/en/latest/) supercomputer. - It should be straightforward to run on a different platform, the requirements are `gcc`, `git` and `python3`. (for the later parts you also need `make`, `autotools`, `cmake` and `spack`). - Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. + This tutorial was originally run on the [Cosma](https://cosma.readthedocs.io/en/latest/) supercomputer. + It should be straightforward to run on a different platform, the requirements are `gcc`, `git` and `python3`. (for the later parts you also need `make`, `autotools`, `cmake` and `spack`). + Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. First load a newer python module. ```bash @@ -42,9 +42,9 @@ You can customise the behaviour of each stage or add a hook before or after each === "ARCHER2" - This tutorial is run on ARCHER2, you should have signed up for a training account before starting. + This tutorial is run on ARCHER2, you should have signed up for a training account before starting. It can be ran on other HPC systems with a batch scheduler but will require making some changes to the config. - Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. + Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. First load the system python module. ```bash @@ -64,10 +64,10 @@ You will have to activate the environment each time you login. To deactivate the ## Install ReFrame -Then install ReFrame with `pip`. I am installing version `4.5.2` because we will follow tutorials that have been changed in the latest `4.6.0` version. +Then install ReFrame with `pip`. ```bash -pip install reframe-hpc==4.5.2 +pip install reframe-hpc ``` Alternatively, you can @@ -77,19 +77,26 @@ git clone -q --depth 1 --branch v4.5.2 https://github.com/reframe-hpc/reframe.gi source reframe/bootstrap.sh ``` -The ReFrame git repository also contains the source code of the ReFrame tutorials. It is recommended to run the git clone step, even if you used `pip install` to install ReFrame. We will refer to the tutorial solutions later. +You can also clone the ReFrame git repository to get the source code of the ReFrame tutorials. +We will refer to some of the tutorial solutions later. +ReFrame rewrote their tutorials in v4.6 and some of the examples we are using are not there anymore, +therefore it's best to clone ReFrame v4.5. --- ## Hello world example -[Hello world example](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_basics.html#the-hello-world-test) +There's a [Hello world example](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_basics.html#the-hello-world-test) in the ReFrame 4.5 tutorial that explains how to create a simple ReFrame test. + +ReFrame tests are python classes that describe how a test is run. ---- -### Include ReFrame modules +### Include ReFrame module -The first thing you need is include a few modules from ReFrame. These should be available if the installation step was successful. +The first thing you need is to include ReFrame. +We separately import sanity to simplify the syntax. +These should be available if the installation step was successful. ```python import reframe as rfm @@ -100,9 +107,10 @@ import reframe.utility.sanity as sn ### Create a Test Class -- ReFrame uses decorators to mark classes as tests. -- This marks `class HelloTest` as a `rfm.simple_test`. -- ReFrame tests ultimately derive from `RegressionTest`. There are other derived classes such as `RunOnlyRegressionTest`, we get to those later. +ReFrame uses decorators to mark classes as tests. +This marks `class HelloTest` as a `rfm.simple_test`. +ReFrame tests ultimately derive from `RegressionTest`. +There are other derived classes such as `RunOnlyRegressionTest`, we get to those later. ```python @@ -110,16 +118,16 @@ import reframe.utility.sanity as sn class HelloTest(rfm.RegressionTest): ``` -- The data members and methods detailed in the following sections should be placed inside this class. +The data members and methods detailed in the following sections should be placed inside this class. ---- ### Add mandatory attributes -- `valid_systems` for where this test can run +- `valid_systems` for where this test can run. For now we haven't defined any systems so we leave it as `'*'` (any system) - `valid_prog_environs` for what compilers this test can build with. More on it later. -- `sourcepath` for source file in a single source test. More on build systems later. -- Could add `sourcesdir` but it defaults to `src/` +- In a test with a single source file, it is enough to define `sourcepath`. More on build systems later. +- We could add `sourcesdir` to point to the source directory, but it defaults to `src/` ```python valid_systems = ['*'] @@ -132,8 +140,8 @@ class HelloTest(rfm.RegressionTest): ### Add sanity check - ReFrame, by default, makes no assumption about whether a test is successful or not. -- A test must provide a validation function -- ReFrame provides a rich set of utility functions that help matching patterns and extract values from the test’s output +- A test must provide a validation function that asserts whether the test was successful +- ReFrame provides utility functions that help matching patterns and extract values from the test’s output - Here we match a string from stdout ```python @@ -154,54 +162,101 @@ reframe -c path/to/benchmark -r ## Builtin programming environment -- `reframe --show-config` -- Builtin programming environment uses `cc` to compile +We didn't tell reframe anything about how to compile the hello world example. How did it compile? +ReFrame uses a buitin programming environment by default. +You can see this with `reframe --show-config` +The builtin programming environment only contains the `cc` compiler, +compiling a C++ or Fortran code will fail --- -## Configuring ReFrame for HPC systems (Cosma) -> In ReFrame, all the details of the various interactions of a test with the system environment are handled transparently and are set up in its configuration file. -- [Configuration](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_basics.html#more-of-hello-world) - - Set accounting parameters with - - `'access': ['--partition=bluefield1', '--account=do009'],` - - Create at least one programming environment to set compilers - - `-p` flag filters tests by programming environment - - Scheduler to run on compute nodes - - Add `time_limit = 1m` to ReFrame tests to run on Cosma - - Set from command line with `-S time_limit='1m'` +## Configuring ReFrame for HPC systems +> In ReFrame, all the details of the various interactions of a test with the system environment are handled transparently and are set up in its [configuration file](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_basics.html#more-of-hello-world). + +=== "Cosma" + + To configure ReFrame for Cosma + - Create a system with a name and a description + - Set the module system + - Create a compute node partition + - Set a scheduler and a MPI launcher to run on compute nodes + - On Cosma, the scheduler rejects jobs that don't set a time limit. Add `time_limit = 1m` to ReFrame tests to run on Cosma or set from command line with `-S time_limit='1m'` + - Set accounting parameters with `'access': ['--partition=bluefield1', '--account=do009'],` + - Create at least one programming environment to set compilers + +=== "ARCHER2" + + To configure ReFrame for ARCHER2 + - Create a system with a name and a description + - Set the module system + - Create a compute node partition + - Set a scheduler and a MPI launcher to run on compute nodes + - Set accounting parameters with `'access': ['--partition=standard', '--qos=short'],` + - Create at least one programming environment to set compilers ---- -```python -site_configuration = { - 'systems' : [ - { - 'name': 'cosma', - 'descr': 'Cosma for performance workshop', - 'hostnames': ['login[0-9][a-z].pri.cosma[0-9].alces.network'], - 'modules_system': 'tmod4', - 'partitions': [ - { - 'name': 'compute-node', - 'scheduler': 'slurm', - 'launcher': 'mpiexec', - 'environs': ['gnu'], - 'access': ['--partition=bluefield1', '--account=do009'], - } - ] - } - ], - 'environments': [ - { - 'name': 'gnu', - 'modules': ['gnu_comp', 'openmpi'], - 'cc': 'mpicc', - 'cxx': 'mpic++', - 'ftn': 'mpif90' - }, - ] -} +=== "Cosma" + ```python + site_configuration = { + 'systems' : [ + { + 'name': 'cosma', + 'descr': 'Cosma for performance workshop', + 'hostnames': ['login[0-9][a-z].pri.cosma[0-9].alces.network'], + 'modules_system': 'tmod4', + 'partitions': [ + { + 'name': 'compute-node', + 'scheduler': 'slurm', + 'launcher': 'mpiexec', + 'environs': ['gnu'], + 'access': ['--partition=bluefield1', '--account=do009'], + } + ] + } + ], + 'environments': [ + { + 'name': 'gnu', + 'modules': ['gnu_comp', 'openmpi'], + 'cc': 'mpicc', + 'cxx': 'mpic++', + 'ftn': 'mpif90' + }, + ] + } +``` + +=== "ARCHER2" + ```python + site_configuration = { + 'systems' : [ + { + 'name': 'archer2', + 'descr': 'ARCHER2 config for CIUK workshop', + 'hostnames': ['ln[0-9]+'], + 'partitions': [ + { + 'name': 'compute-node', + 'scheduler': 'slurm', + 'launcher': 'srun', + 'access': ['--partition=standard', '--qos=short'], + 'environs': ['cray'], + } + ] + } + ], + 'environments': [ + { + 'name': 'cray', + 'cc': 'mpicc', + 'cxx': 'mpic++', + 'ftn': 'mpif90' + }, + ] + } ``` --- @@ -260,7 +315,7 @@ We can set environment variables by defining the `env_vars` attribute build_system='SingleSource' sourcepath='stream.c' arraysize = 2**20 - + @run_before('compile') def set_compiler_flags(self): self.build_system.cppflags = [f'-DSTREAM_ARRAY_SIZE={self.arraysize}'] @@ -309,7 +364,7 @@ def extract_triad_perf(self): ### Perflogs -The output from performance tests is written in perflogs. They are csv files that are appended each time a test is ran. By default the perflogs are output in `perflogs//`. By default a lot of information about the test is stored. This can be customized in the configuration file. +The output from performance tests is written in perflogs. They are csv files that are appended each time a test is ran. By default the perflogs are output in `perflogs//`. By default a lot of information about the test is stored. This can be customized in the configuration file. By default there is not much information about build step, but ReFrame will provide a link back to build environment. A more verbose report is written in `.reframe/reports/`, you can use the `--report-file` option to direct the report to a different file. `excalibur-tests` provides tools to read and process the perflogs. See the [Next Tutorial](../archer2_tutorial) for details. @@ -355,9 +410,9 @@ You can have multiple parameters. ReFrame will run all parameter combinations by ---- -### [Make](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_advanced.html#more-on-building-tests) +### [Make](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_advanced.html#more-on-building-tests) -- Tutorial in `tutorials/advanced/makefiles/maketest.py`. +- Tutorial in `tutorials/advanced/makefiles/maketest.py`. > First, if you’re using any build system other than SingleSource, you must set the executable attribute of the test, because ReFrame cannot know what is the actual executable to be run. We then set the build system to Make and set the preprocessor flags as we would do with the SingleSource build system. @@ -381,7 +436,7 @@ class AutoHelloTest(rfm.RegressionTest): executable = './src/hello' prebuild_cmds = ['autoreconf --install .'] time_limit = '5m' - + @sanity_function def assert_hello(self): return sn.assert_found(r'Hello world\!', self.stdout) @@ -405,7 +460,7 @@ class CMakeHelloTest(rfm.RegressionTest): build_system = 'CMake' executable = './CMakeHelloWorld' time_limit = '5m' - + @sanity_function def assert_hello(self): return sn.assert_found(r'Hello, world\!', self.stdout) From 2d939b7ac34574e5c70d086bbc947fd7423f5ce5 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Mon, 16 Dec 2024 17:11:08 +0000 Subject: [PATCH 19/32] WIP, renders tabs --- docs/tutorial/durham_reframe_tutorial.md | 110 +++++++++++++---------- mkdocs.yml | 3 +- 2 files changed, 64 insertions(+), 49 deletions(-) diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/durham_reframe_tutorial.md index e4ac491c..c13ab269 100644 --- a/docs/tutorial/durham_reframe_tutorial.md +++ b/docs/tutorial/durham_reframe_tutorial.md @@ -27,29 +27,27 @@ You can customise the behaviour of each stage or add a hook before or after each --- -## Set up environment +## Set up python environment === "Cosma" - This tutorial was originally run on the [Cosma](https://cosma.readthedocs.io/en/latest/) supercomputer. + This tutorial is run on the [Cosma](https://cosma.readthedocs.io/en/latest/) supercomputer. It should be straightforward to run on a different platform, the requirements are `gcc`, `git` and `python3`. (for the later parts you also need `make`, `autotools`, `cmake` and `spack`). - Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. - First load a newer python module. - - ```bash - module swap python/3.10.12 - ``` + Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. + First load a newer python module. + ```bash + module swap python/3.10.12 + ``` === "ARCHER2" - This tutorial is run on ARCHER2, you should have signed up for a training account before starting. - It can be ran on other HPC systems with a batch scheduler but will require making some changes to the config. - Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. - First load the system python module. - - ```bash - module load cray-python - ``` + This tutorial is run on ARCHER2, you should have signed up for a training account before starting. + It can be ran on other HPC systems with a batch scheduler but will require making some changes to the config. + Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. + First load the system python module. + ```bash + module load cray-python + ``` Then create an environment and activate it with @@ -173,30 +171,21 @@ compiling a C++ or Fortran code will fail ## Configuring ReFrame for HPC systems > In ReFrame, all the details of the various interactions of a test with the system environment are handled transparently and are set up in its [configuration file](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_basics.html#more-of-hello-world). -=== "Cosma" - - To configure ReFrame for Cosma - - Create a system with a name and a description - - Set the module system - - Create a compute node partition - - Set a scheduler and a MPI launcher to run on compute nodes - - On Cosma, the scheduler rejects jobs that don't set a time limit. Add `time_limit = 1m` to ReFrame tests to run on Cosma or set from command line with `-S time_limit='1m'` - - Set accounting parameters with `'access': ['--partition=bluefield1', '--account=do009'],` - - Create at least one programming environment to set compilers - -=== "ARCHER2" +For the minimum configuration to run jobs on the system we need to - To configure ReFrame for ARCHER2 - - Create a system with a name and a description - - Set the module system - - Create a compute node partition - - Set a scheduler and a MPI launcher to run on compute nodes - - Set accounting parameters with `'access': ['--partition=standard', '--qos=short'],` - - Create at least one programming environment to set compilers +=== "Cosma" ----- + - Create a system with a name and a description + - Set the module system for accessing centrally installed modules + - Create a compute node partition + - Set a scheduler and a MPI launcher to run on compute nodes + - On Cosma, the scheduler rejects jobs that don't set a time limit. Add `time_limit = 1m` to ReFrame tests to run on Cosma or set from command line with `-S time_limit='1m'` + - Set access options + ``` + 'access': ['--partition=bluefield1', '--account=do009'], + ``` + - Create at least one programming environment to set compilers -=== "Cosma" ```python site_configuration = { 'systems' : [ @@ -226,10 +215,20 @@ compiling a C++ or Fortran code will fail }, ] } -``` + ``` === "ARCHER2" + - Create a system with a name and a description + - Set the module system for accessing centrally installed modules + - Create a compute node partition + - Set a scheduler and a MPI launcher to run on compute nodes + - Set access options with + ``` + 'access': ['--partition=standard', '--qos=short'], + ``` + - Create at least one programming environment to set compilers + ```python site_configuration = { 'systems' : [ @@ -257,7 +256,7 @@ compiling a C++ or Fortran code will fail }, ] } -``` + ``` --- @@ -269,17 +268,32 @@ Performance tests capture data in performance variables. For simplicity, we use ### Boilerplate -Same as before. We can now specify valid systems and prog environments on Cosma. You can adapt these to your platform, or use `'*'` to run on any platform. +Same as before. We can now specify valid systems and programming environments to run on the system we just configured. +You can adapt these to your system, or keep using `'*'` to run on any platform. -```python -import reframe as rfm -import reframe.utility.sanity as sn -@rfm.simple_test -class StreamTest(rfm.RegressionTest): - valid_systems = ['cosma'] - valid_prog_environs = ['gnu', 'intel'] -``` +=== "Cosma" + ```python + import reframe as rfm + import reframe.utility.sanity as sn + + @rfm.simple_test + class StreamTest(rfm.RegressionTest): + valid_systems = ['cosma'] + valid_prog_environs = ['gnu'] + ``` + +=== "ARCHER2" + + ```python + import reframe as rfm + import reframe.utility.sanity as sn + + @rfm.simple_test + class StreamTest(rfm.RegressionTest): + valid_systems = ['archer2'] + valid_prog_environs = ['cray'] + ``` ---- diff --git a/mkdocs.yml b/mkdocs.yml index c1ec6b0c..7af2e776 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -70,4 +70,5 @@ markdown_extensions: - pymdownx.inlinehilite - pymdownx.snippets - pymdownx.superfences - - pymdownx.tabbed + - pymdownx.tabbed: + alternate_style: true From b02ceaba94f10cbc214c157236bd8b5c2f798f00 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Mon, 16 Dec 2024 17:17:15 +0000 Subject: [PATCH 20/32] Update python --- .github/workflows/docs.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml index 27f72826..d7c63396 100644 --- a/.github/workflows/docs.yml +++ b/.github/workflows/docs.yml @@ -49,7 +49,7 @@ jobs: - uses: actions/setup-python@v5 with: - python-version: '3.10.6' + python-version: '3.12' cache: 'pip' - name: Install Python dependencies From ca9b88937bb9be66647da83d4485fde114d1da47 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Tue, 17 Dec 2024 11:01:18 +0000 Subject: [PATCH 21/32] Add more verbose text, tabs for system specifics --- docs/tutorial/durham_reframe_tutorial.md | 60 +++++++++++++++++------- 1 file changed, 44 insertions(+), 16 deletions(-) diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/durham_reframe_tutorial.md index c13ab269..29e0fc46 100644 --- a/docs/tutorial/durham_reframe_tutorial.md +++ b/docs/tutorial/durham_reframe_tutorial.md @@ -4,7 +4,7 @@ 1. How ReFrame executes tests 2. Structure of a ReFrame test -- Hello world example -3. Configuring ReFrame to run tests on Cosma +3. Configuring ReFrame to run tests on HPC systems 4. Writing performance tests -- Stream example 5. Working with build systems -- Make, CMake, Autotools, Spack examples 6. Avoiding build systems -- Run-only tests @@ -262,13 +262,19 @@ For the minimum configuration to run jobs on the system we need to ## Performance tests -Performance tests capture data in performance variables. For simplicity, we use the [STREAM benchmark](https://github.com/jeffhammond/STREAM) as an example. It is the de facto memory bandwidth benchmark. +Performance tests capture data in performance variables. For simplicity, we use the [STREAM benchmark](https://github.com/jeffhammond/STREAM) as an example. It is the de facto memory bandwidth benchmark. It has four kernels that stream arrays from memory and +perform different floating point operations on them. +- Copy: `A = B` +- Scale: `C = a * B` +- Add: `C = A + B` +- Triad: `C = a * A + B` ---- ### Boilerplate -Same as before. We can now specify valid systems and programming environments to run on the system we just configured. +The imports and the class declaration look the same as before. +We can now specify valid systems and programming environments to run on the system we just configured. You can adapt these to your system, or keep using `'*'` to run on any platform. @@ -299,7 +305,7 @@ You can adapt these to your system, or keep using `'*'` to run on any platform. ### Git Cloning the source -we can retrieve specifically a Git repository by assigning its URL directly to the sourcesdir attribute: +We can retrieve specifically a Git repository by assigning its URL directly to the sourcesdir attribute: ```python sourcesdir='https://github.com/jeffhammond/STREAM' @@ -309,21 +315,22 @@ we can retrieve specifically a Git repository by assigning its URL directly to t ### Environment variables -We can set environment variables by defining the `env_vars` attribute +We can set environment variables in the `env_vars` dictionary. ```python - env_vars = { - 'OMP_NUM_THREADS': '4', - 'OMP_PLACES': 'cores' - } + self.env_vars['OMP_NUM_THREADS'] = 4 + self.env_vars['OMP_PLACES'] = 'cores' ``` ---- ### Building -- Remember the pipeline ReFrame executes. We can run arbitrary functions in the pipeline by decorating them with `@run_before` or `@run_after` -- Here we can insert compiler flags before compiling +Recall the pipeline ReFrame executes when running a test. +We can insert arbitrary functions between any steps in in the pipeline by decorating them with `@run_before` or `@run_after` +Here we can set compiler flags before compiling. +The STREAM benchmark takes the array size as a compile time argument. +It should be large enough to overflow all levels of cache so that there is no data reuse and we measure the main memory bandwidth. ```python build_system='SingleSource' @@ -350,7 +357,7 @@ Similar to before, we can check a line in stdout for validation. ---- -## Add Performance Pattern Check +### Add Performance Pattern Check To record the performance of the benchmark, ReFrame should extract a figure of merit from the output of the test. A function decorated with the `@performance_function` decorator extracts or computes a performance metric from the test’s output. @@ -389,6 +396,8 @@ By default there is not much information about build step, but ReFrame will prov ReFrame can automate checking that the results fall within an expected range. You can set a different reference value for each `perf_key` in the performance function. For example, set the test to fail if it falls outside of +-25% of the values obtained with the previous array size. + +=== "Cosma" ```python reference = { 'cosma': { @@ -400,6 +409,18 @@ reference = { } ``` +=== "Archer2" +```python +reference = { + 'archer2': { + 'Copy': (260000, -0.25, 0.25, 'MB/s'), + 'Scale': (200000, -0.25, 0.25, 'MB/s'), + 'Add': (200000, -0.25, 0.25, 'MB/s'), + 'Triad': (200000, -0.25, 0.25, 'MB/s') + } +} +``` + > The performance reference tuple consists of the reference value, the lower and upper thresholds expressed as fractional numbers relative to the reference value, and the unit of measurement. If any of the thresholds is not relevant, None may be used instead. Also, the units in this reference variable are entirely optional, since they were already provided through the @performance_function decorator. @@ -420,13 +441,16 @@ You can have multiple parameters. ReFrame will run all parameter combinations by --- ## [Build systems](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_advanced.html#more-on-building-tests) -- [Build systems Reference](https://reframe-hpc.readthedocs.io/en/v4.5.2/regression_test_api.html#build-systems) + +ReFrame supports many commonly used build systems, include Cmake, Autotools, Spack and Easybuild. See the +[Build systems Reference](https://reframe-hpc.readthedocs.io/en/v4.5.2/regression_test_api.html#build-systems) for details. +Here we show a few examples. ---- ### [Make](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_advanced.html#more-on-building-tests) -- Tutorial in `tutorials/advanced/makefiles/maketest.py`. +- [Tutorial in `tutorials/advanced/makefiles/maketest.py`.](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_advanced.html#more-on-building-tests) > First, if you’re using any build system other than SingleSource, you must set the executable attribute of the test, because ReFrame cannot know what is the actual executable to be run. We then set the build system to Make and set the preprocessor flags as we would do with the SingleSource build system. @@ -486,7 +510,7 @@ class CMakeHelloTest(rfm.RegressionTest): ### [Spack](https://reframe-hpc.readthedocs.io/en/v4.5.2/regression_test_api.html#reframe.core.buildsystems.Spack) - ReFrame will use a user-provided Spack environment in order to build and test a set of specs. -- Tutorial in `tutorials/build_systems/spack/spack_test.py` +- [Tutorial in `tutorials/build_systems/spack/spack_test.py`](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_build_automation.html#using-spack-to-build-the-test-code) - In `rfm_job.out` you can see that it - Creates a blank environment - Builds all dependencies -- takes quite long @@ -495,4 +519,8 @@ class CMakeHelloTest(rfm.RegressionTest): --- ## Run-only tests -- Tutorial in `tutorials/advanced/runonly/echorand.py` + +If you don't wish to build your application in ReFrame (we recommend that you do!), you can define a run-only test. +Run-only tests derive from the `rfm.RunOnlyRegressionTest` class instead of `rfm.RegressionTest`. +Instead of a build system, you define an executable which reframe expects to find in `$PATH`. +See [tutorial in `tutorials/advanced/runonly/echorand.py`](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_advanced.html#writing-a-run-only-regression-test) From a84a01dcf6de798b8a3488183d6b485e5909140e Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Tue, 17 Dec 2024 11:28:17 +0000 Subject: [PATCH 22/32] Rename tutorials with more descriptive names --- .../{archer2_tutorial.md => excalibur-tests_tutorial.md} | 0 .../{durham_reframe_tutorial.md => reframe_tutorial.md} | 4 ++-- mkdocs.yml | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) rename docs/tutorial/{archer2_tutorial.md => excalibur-tests_tutorial.md} (100%) rename docs/tutorial/{durham_reframe_tutorial.md => reframe_tutorial.md} (99%) diff --git a/docs/tutorial/archer2_tutorial.md b/docs/tutorial/excalibur-tests_tutorial.md similarity index 100% rename from docs/tutorial/archer2_tutorial.md rename to docs/tutorial/excalibur-tests_tutorial.md diff --git a/docs/tutorial/durham_reframe_tutorial.md b/docs/tutorial/reframe_tutorial.md similarity index 99% rename from docs/tutorial/durham_reframe_tutorial.md rename to docs/tutorial/reframe_tutorial.md index 29e0fc46..72ced38d 100644 --- a/docs/tutorial/durham_reframe_tutorial.md +++ b/docs/tutorial/reframe_tutorial.md @@ -388,7 +388,7 @@ def extract_triad_perf(self): The output from performance tests is written in perflogs. They are csv files that are appended each time a test is ran. By default the perflogs are output in `perflogs//`. By default a lot of information about the test is stored. This can be customized in the configuration file. By default there is not much information about build step, but ReFrame will provide a link back to build environment. A more verbose report is written in `.reframe/reports/`, you can use the `--report-file` option to direct the report to a different file. -`excalibur-tests` provides tools to read and process the perflogs. See the [Next Tutorial](../archer2_tutorial) for details. +`excalibur-tests` provides tools to read and process the perflogs. See the [Next Tutorial](../excalibur-tests_tutorial) for details. ---- @@ -514,7 +514,7 @@ class CMakeHelloTest(rfm.RegressionTest): - In `rfm_job.out` you can see that it - Creates a blank environment - Builds all dependencies -- takes quite long -- `excalibur-tests` provides utilities and settings for Spack builds in ReFrame. See the [Next Tutorial](../archer2_tutorial) for details. +- `excalibur-tests` provides utilities and settings for Spack builds in ReFrame. See the [Next Tutorial](../excalibur-tests_tutorial) for details. --- diff --git a/mkdocs.yml b/mkdocs.yml index 7af2e776..9b2992c5 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -41,8 +41,8 @@ nav: - Kathleen: systems#myriad-and-kathleen - Tursa: systems#tursa - 'Tutorials': - - ReFrame Tutorial: tutorial/durham_reframe_tutorial.md - - ARCHER2 Tutorial: tutorial/archer2_tutorial.md + - ReFrame Tutorial: tutorial/reframe_tutorial.md + - excalibur-tests Tutorial: tutorial/excalibur-tests_tutorial.md theme: name: material features: From e9bf04a10a5ccf1905af9a97edf0f6484645b2c1 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Tue, 17 Dec 2024 11:32:16 +0000 Subject: [PATCH 23/32] Fix warnings --- docs/tutorial/reframe_tutorial.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/tutorial/reframe_tutorial.md b/docs/tutorial/reframe_tutorial.md index 72ced38d..26c0e35c 100644 --- a/docs/tutorial/reframe_tutorial.md +++ b/docs/tutorial/reframe_tutorial.md @@ -388,7 +388,7 @@ def extract_triad_perf(self): The output from performance tests is written in perflogs. They are csv files that are appended each time a test is ran. By default the perflogs are output in `perflogs//`. By default a lot of information about the test is stored. This can be customized in the configuration file. By default there is not much information about build step, but ReFrame will provide a link back to build environment. A more verbose report is written in `.reframe/reports/`, you can use the `--report-file` option to direct the report to a different file. -`excalibur-tests` provides tools to read and process the perflogs. See the [Next Tutorial](../excalibur-tests_tutorial) for details. +`excalibur-tests` provides tools to read and process the perflogs. See the [Next Tutorial](excalibur-tests_tutorial.md) for details. ---- @@ -514,7 +514,7 @@ class CMakeHelloTest(rfm.RegressionTest): - In `rfm_job.out` you can see that it - Creates a blank environment - Builds all dependencies -- takes quite long -- `excalibur-tests` provides utilities and settings for Spack builds in ReFrame. See the [Next Tutorial](../excalibur-tests_tutorial) for details. +- `excalibur-tests` provides utilities and settings for Spack builds in ReFrame. See the [Next Tutorial](excalibur-tests_tutorial.md) for details. --- From 185861f5b0211fa8b02b8fc484bbd0fa7af9973d Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Tue, 17 Dec 2024 13:38:04 +0000 Subject: [PATCH 24/32] Fix indentation --- docs/tutorial/reframe_tutorial.md | 36 +++++++++++++++---------------- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/docs/tutorial/reframe_tutorial.md b/docs/tutorial/reframe_tutorial.md index 26c0e35c..27c40a76 100644 --- a/docs/tutorial/reframe_tutorial.md +++ b/docs/tutorial/reframe_tutorial.md @@ -398,28 +398,28 @@ ReFrame can automate checking that the results fall within an expected range. Yo === "Cosma" -```python -reference = { - 'cosma': { - 'Copy': (40000, -0.25, 0.25, 'MB/s'), - 'Scale': (20000, -0.25, 0.25, 'MB/s'), - 'Add': (20000, -0.25, 0.25, 'MB/s'), - 'Triad': (20000, -0.25, 0.25, 'MB/s') + ```python + reference = { + 'cosma': { + 'Copy': (40000, -0.25, 0.25, 'MB/s'), + 'Scale': (20000, -0.25, 0.25, 'MB/s'), + 'Add': (20000, -0.25, 0.25, 'MB/s'), + 'Triad': (20000, -0.25, 0.25, 'MB/s') + } } -} -``` + ``` === "Archer2" -```python -reference = { - 'archer2': { - 'Copy': (260000, -0.25, 0.25, 'MB/s'), - 'Scale': (200000, -0.25, 0.25, 'MB/s'), - 'Add': (200000, -0.25, 0.25, 'MB/s'), - 'Triad': (200000, -0.25, 0.25, 'MB/s') + ```python + reference = { + 'archer2': { + 'Copy': (260000, -0.25, 0.25, 'MB/s'), + 'Scale': (200000, -0.25, 0.25, 'MB/s'), + 'Add': (200000, -0.25, 0.25, 'MB/s'), + 'Triad': (200000, -0.25, 0.25, 'MB/s') + } } -} -``` + ``` > The performance reference tuple consists of the reference value, the lower and upper thresholds expressed as fractional numbers relative to the reference value, and the unit of measurement. If any of the thresholds is not relevant, None may be used instead. Also, the units in this reference variable are entirely optional, since they were already provided through the @performance_function decorator. From ab5ebacff2a195b9f8ddecc6d01efe9facec5392 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Wed, 18 Dec 2024 14:32:37 +0000 Subject: [PATCH 25/32] Restructure tutorials --- docs/tutorial/excalibur-tests_tutorial.md | 593 ++++++---------------- docs/tutorial/getting-started.md | 50 ++ docs/tutorial/postprocessing_tutorial.md | 204 ++++++++ docs/tutorial/profiling_tutorial.md | 1 + docs/tutorial/reframe_tutorial.md | 5 +- mkdocs.yml | 5 +- post-processing/.gitignore | 3 + 7 files changed, 424 insertions(+), 437 deletions(-) create mode 100644 docs/tutorial/getting-started.md create mode 100644 docs/tutorial/postprocessing_tutorial.md create mode 100644 docs/tutorial/profiling_tutorial.md create mode 100644 post-processing/.gitignore diff --git a/docs/tutorial/excalibur-tests_tutorial.md b/docs/tutorial/excalibur-tests_tutorial.md index 8f8752bf..deab5f66 100644 --- a/docs/tutorial/excalibur-tests_tutorial.md +++ b/docs/tutorial/excalibur-tests_tutorial.md @@ -8,62 +8,7 @@ # Using ReFrame for reproducible and portable performance benchmarking -In this tutorial you will set up the benchmarking framework on the [ARCHER2](https://www.archer2.ac.uk) supercomputer, build and run example benchmarks, create a new benchmark and explore benchmark data. - ---- - -## Getting Started - -To complete this tutorial, you need to [connect to ARCHER2 via ssh](https://docs.archer2.ac.uk/user-guide/connecting/). You will need - -1. An ARCHER2 account. You can [request a new account](https://docs.archer2.ac.uk/quick-start/quickstart-users/#request-an-account-on-archer2) if you haven't got one you can use. Use the project code `ta131` to request your account. You can use an existing ARCHER2 account to complete this workshop. -2. A command line terminal with an ssh client. Most Linux and Mac systems come with these preinstalled. Please see [Connecting to ARCHER2](https://docs.archer2.ac.uk/user-guide/connecting/#command-line-terminal) for more information and Windows instructions. - ----- - -### ssh - -Once you have the above prerequisites, you have to [generate an ssh key pair](https://docs.archer2.ac.uk/user-guide/connecting/#ssh-key-pairs) and [upload the public key to SAFE](https://docs.archer2.ac.uk/user-guide/connecting/#upload-public-part-of-key-pair-to-safe). - -When you are done, check that you are able to connect to ARCHER2 with - -```bash -ssh username@login.archer2.ac.uk -``` - ----- - -### ARCHER2 MFA - -ARCHER2 is deploying mandatory multi-factor authentication (MFA) **Today**! - -This was deployed between 0900 and 1000 this morning (06/12/2023). - -This means that once the switch has happened, SSH keys will work as before, but instead of your ARCHER2 password, a Time-based One-Time Password (TOTP) code will be requested. - -TOTP is a six digit number, refreshed every 30 seconds, which is generated typically by an app running on your mobile phone or laptop. - -Thus authentication will require two factors: - -1) SSH key and passphrase -2) TOTP - ----- - -### ARCHER2 MFA Docs and Support - -The SAFE documentation which details how to set up MFA on machine accounts (ARCHER2) is available at: -https://epcced.github.io/safe-docs/safe-for-users/#how-to-turn-on-mfa-on-your-machine-account - -The documentation includes how to set this up without the need of a personal smartphone device. - -We have also updated the ARCHER2 documentation with details of the new connection process: -https://docs.archer2.ac.uk/user-guide/connecting-totp/ -https://docs.archer2.ac.uk/quick-start/quickstart-users-totp/ - -If there are any issues or concerns please contact us at: - -support@archer2.ac.uk +In this tutorial you will set up the excalibur-tests benchmarking framework on a HPC system, build and run example benchmarks, create a new benchmark and explore benchmark data. --- @@ -73,29 +18,35 @@ support@archer2.ac.uk ### Set up python -We are going to use `python` and the `pip` package installer to install and run the framework. Load the `cray-python` module to get a python version that fills the requirements. -```bash -module load cray-python -``` -You can check with `python3 --version` that your python version is `3.8` or greater. You will have to load this module every time you login. +=== "ARCHER2" -(at the time of writing, the default version was `3.9.13`). + We are going to use `python` and the `pip` package installer to install and run the framework. Load the `cray-python` module to get a python version that fills the requirements. + ```bash + module load cray-python + ``` + You can check with `python3 --version` that your python version is `3.8` or greater. You will have to load this module every time you login. + + (at the time of writing, the default version was `3.9.13`). ---- ### Change to work directory -On ARCHER2, the compute nodes do not have access to your home directory, therefore it is important to install everything in a [work file system](https://docs.archer2.ac.uk/user-guide/data/#work-file-systems). Change to the work directory with +=== "ARCHER2" -```bash -cd /work/ta131/ta131/${USER} -``` + On ARCHER2, the compute nodes do not have access to your home directory, therefore it is important to install everything in a [work file system](https://docs.archer2.ac.uk/user-guide/data/#work-file-systems). + Change to the work directory with + + ```bash + cd /work/ta131/ta131/${USER} + ``` -If you are tempted to use a symlink here, ensure you use `cd -P` when changing directory. Archer2 compute nodes cannot read from `/home`, only `/work`, so not completely following symlinks can result in a broken installation. + If you are tempted to use a symlink here, ensure you use `cd -P` when changing directory. + ARCHER2 compute nodes cannot read from `/home`, only `/work`, so not completely following symlinks can result in a broken installation. ---- -### Clone the framework repository +### Clone the git repository In the work directory, clone the [excalibur-tests](https://github.com/ukri-excalibur/excalibur-tests) repository with @@ -184,13 +135,15 @@ $ ls $RFM_CONFIG_FILES If you log out and back in, you will have to run some of the above commands again to recreate your environment. These are (from your `work` directory): -``` -module load cray-python -source excalibur-env/bin/activate -export RFM_CONFIG_FILES="$(pwd)/excalibur-tests/benchmarks/reframe_config.py" -export RFM_USE_LOGIN_SHELL="true" -source ./spack/share/spack/setup-env.sh -``` + +=== "ARCHER2" + ```bash + module load cray-python + source excalibur-env/bin/activate + export RFM_CONFIG_FILES="$(pwd)/excalibur-tests/benchmarks/reframe_config.py" + export RFM_USE_LOGIN_SHELL="true" + source ./spack/share/spack/setup-env.sh + ``` ---- @@ -203,53 +156,56 @@ reframe -c -r ---- -### ARCHER2 specific commands +### system specific flags -In addition, on ARCHER2, you have to provide the quality of service (QoS) type for your job to ReFrame on the command line with `-J`. Use the "short" QoS to run the sombrero example with -```bash -reframe -c excalibur-tests/benchmarks/examples/sombrero -r -J'--qos=short' -``` -You may notice you actually ran four benchmarks with that single command! That is because the benchmark is parametrized. We will talk about this in the next section. + +=== "ARCHER2" + In addition, on ARCHER2, you have to provide the quality of service (QoS) type for your job to ReFrame on the command line with `-J`. Use the "short" QoS to run the sombrero example with + ```bash + reframe -c excalibur-tests/benchmarks/examples/sombrero -r -J'--qos=short' + ``` + You may notice you actually ran four benchmarks with that single command! That is because the benchmark is parametrized. We will talk about this in the next section. ---- ### Output sample -```bash -$ reframe -c benchmarks/examples/sombrero/ -r -J'--qos=short' --performance-report -[ReFrame Setup] - version: 4.3.0 - command: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/excalibur-env/bin/reframe -c benchmarks/examples/sombrero/ -r -J--qos=short' - launched by: tk-d193@ln03 - working directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/excalibur-tests' - settings files: '', '/work/d193/d193/tk-d193/excalibur-tests/benchmarks/reframe_config.py' - check search path: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/excalibur-tests/benchmarks/examples/sombrero' - stage directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/excalibur-tests/stage' - output directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/excalibur-tests/output' - log files: '/tmp/rfm-u1l6yt7f.log' - -[==========] Running 4 check(s) -[==========] Started on Fri Jul 7 15:47:45 2023 - -[----------] start processing checks -[ RUN ] SombreroBenchmark %tasks=2 %cpus_per_task=2 /de04c10b @archer2:compute-node+default -[ RUN ] SombreroBenchmark %tasks=2 %cpus_per_task=1 /c52a123d @archer2:compute-node+default -[ RUN ] SombreroBenchmark %tasks=1 %cpus_per_task=2 /c1c3a3f1 @archer2:compute-node+default -[ RUN ] SombreroBenchmark %tasks=1 %cpus_per_task=1 /52e1ce98 @archer2:compute-node+default -[ OK ] (1/4) SombreroBenchmark %tasks=1 %cpus_per_task=2 /c1c3a3f1 @archer2:compute-node+default -P: flops: 0.67 Gflops/seconds (r:1.2, l:None, u:None) -[ OK ] (2/4) SombreroBenchmark %tasks=1 %cpus_per_task=1 /52e1ce98 @archer2:compute-node+default -P: flops: 0.67 Gflops/seconds (r:1.2, l:None, u:None) -[ OK ] (3/4) SombreroBenchmark %tasks=2 %cpus_per_task=2 /de04c10b @archer2:compute-node+default -P: flops: 1.27 Gflops/seconds (r:1.2, l:None, u:None) -[ OK ] (4/4) SombreroBenchmark %tasks=2 %cpus_per_task=1 /c52a123d @archer2:compute-node+default -P: flops: 1.24 Gflops/seconds (r:1.2, l:None, u:None) -[----------] all spawned checks have finished - -[ PASSED ] Ran 4/4 test case(s) from 4 check(s) (0 failure(s), 0 skipped, 0 aborted) -[==========] Finished on Fri Jul 7 15:48:23 2023 -Log file(s) saved in '/tmp/rfm-u1l6yt7f.log' -``` +=== "ARCHER2" + ```bash + $ reframe -c benchmarks/examples/sombrero/ -r -J'--qos=short' --performance-report + [ReFrame Setup] + version: 4.3.0 + command: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/excalibur-env/bin/reframe -c benchmarks/examples/sombrero/ -r -J--qos=short' + launched by: tk-d193@ln03 + working directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/excalibur-tests' + settings files: '', '/work/d193/d193/tk-d193/excalibur-tests/benchmarks/reframe_config.py' + check search path: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/excalibur-tests/benchmarks/examples/sombrero' + stage directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/excalibur-tests/stage' + output directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/excalibur-tests/output' + log files: '/tmp/rfm-u1l6yt7f.log' + + [==========] Running 4 check(s) + [==========] Started on Fri Jul 7 15:47:45 2023 + + [----------] start processing checks + [ RUN ] SombreroBenchmark %tasks=2 %cpus_per_task=2 /de04c10b @archer2:compute-node+default + [ RUN ] SombreroBenchmark %tasks=2 %cpus_per_task=1 /c52a123d @archer2:compute-node+default + [ RUN ] SombreroBenchmark %tasks=1 %cpus_per_task=2 /c1c3a3f1 @archer2:compute-node+default + [ RUN ] SombreroBenchmark %tasks=1 %cpus_per_task=1 /52e1ce98 @archer2:compute-node+default + [ OK ] (1/4) SombreroBenchmark %tasks=1 %cpus_per_task=2 /c1c3a3f1 @archer2:compute-node+default + P: flops: 0.67 Gflops/seconds (r:1.2, l:None, u:None) + [ OK ] (2/4) SombreroBenchmark %tasks=1 %cpus_per_task=1 /52e1ce98 @archer2:compute-node+default + P: flops: 0.67 Gflops/seconds (r:1.2, l:None, u:None) + [ OK ] (3/4) SombreroBenchmark %tasks=2 %cpus_per_task=2 /de04c10b @archer2:compute-node+default + P: flops: 1.27 Gflops/seconds (r:1.2, l:None, u:None) + [ OK ] (4/4) SombreroBenchmark %tasks=2 %cpus_per_task=1 /c52a123d @archer2:compute-node+default + P: flops: 1.24 Gflops/seconds (r:1.2, l:None, u:None) + [----------] all spawned checks have finished + + [ PASSED ] Ran 4/4 test case(s) from 4 check(s) (0 failure(s), 0 skipped, 0 aborted) + [==========] Finished on Fri Jul 7 15:48:23 2023 + Log file(s) saved in '/tmp/rfm-u1l6yt7f.log' + ``` ---- @@ -261,140 +217,6 @@ While the benchmark is running, the log files are kept in the `stage/` directory You can find the performance log file from the benchmark in `perflogs/`. The perflog records the captured figures of merit, environment variables and metadata about the job. ---- - -## Postprocess Benchmark Results - -Now let's look at the Benchmark performance results, and create a plot to visualise them. - -**NOTE:** The post-processing package is currently under heavy development. Please refer to the latest `post-processing/README.md` and `post-processing/post_processing_config.yaml` to use it. - ----- - -### The perflog - -After running the Sombrero benchmark once you should have a perflog in `perflogs/archer2/compute-node/SombreroBenchmark.log` that looks like this -``` -job_completion_time|version|info|jobid|num_tasks|num_cpus_per_task|num_tasks_per_node|num_gpus_per_node|flops_value|flops_unit|flops_ref|flops_lower_thres|flops_upper_thres|spack_spec|display_name|system|partition|environ|extra_resources|env_vars|tags -2023-08-25T11:23:46|reframe 4.3.2|SombreroBenchmark %tasks=2 %cpus_per_task=2 /de04c10b @archer2:compute-node+default|4323431|2|2|1|null|1.31|Gflops/seconds|1.2|-0.2|None|sombrero@2021-08-16|SombreroBenchmark %tasks=2 %cpus_per_task=2|archer2|compute-node|default|{}|{"OMP_NUM_THREADS": "2"}|example -2023-08-25T11:23:48|reframe 4.3.2|SombreroBenchmark %tasks=1 %cpus_per_task=2 /c1c3a3f1 @archer2:compute-node+default|4323433|1|2|1|null|0.67|Gflops/seconds|1.2|-0.2|None|sombrero@2021-08-16|SombreroBenchmark %tasks=1 %cpus_per_task=2|archer2|compute-node|default|{}|{"OMP_NUM_THREADS": "2"}|example -2023-08-25T11:23:48|reframe 4.3.2|SombreroBenchmark %tasks=1 %cpus_per_task=1 /52e1ce98 @archer2:compute-node+default|4323434|1|1|1|null|0.67|Gflops/seconds|1.2|-0.2|None|sombrero@2021-08-16|SombreroBenchmark %tasks=1 %cpus_per_task=1|archer2|compute-node|default|{}|{"OMP_NUM_THREADS": "1"}|example -2023-08-25T11:23:48|reframe 4.3.2|SombreroBenchmark %tasks=2 %cpus_per_task=1 /c52a123d @archer2:compute-node+default|4323432|2|1|1|null|1.29|Gflops/seconds|1.2|-0.2|None|sombrero@2021-08-16|SombreroBenchmark %tasks=2 %cpus_per_task=1|archer2|compute-node|default|{}|{"OMP_NUM_THREADS": "1"}|example -``` -Every time the same benchmark is run, a line is appended in this perflog. - ----- - -The perflog contains -- Some general info about the benchmark run, including system, spack, and environment info. -- The Figure(s) Of Merit (FOM) value, units, reference value, and lower and upper limits (`flops` in this case) -- The `display_name` field, which encodes the benchmark name and parameters (`SombreroBenchmark %tasks=... %cpus_per_task=...` in this case) -- Other quantities the user might want to compare performance for, passed as environment variables and encoded in the `env_vars` field. -- The benchmark `tag` - another way to encode benchmark inputs, defined by the benchmark developers. - ----- - -### The plotting configuration file - - -The framework contains tools to plot the FOMs of benchmarks against any of the other parameters in the perflog. This generic plotting is driven by a configuration YAML file. Let's make one, and save it in `excalibur-tests/post-processing/post_processing_config.yaml`. - -The file needs to include -- Plot title -- Axis information -- Data series -- Filters -- Data types - ----- - -### Title and Axes - -Axes must have a value specified with a perflog column name or a benchmark parameter name, and units specified with either a perflog column name or a custom label (including `null`). -```yaml -title: Performance vs number of tasks and CPUs_per_task - -x_axis: - value: "tasks" - units: - custom: null - -y_axis: - value: "flops_value" - units: - column: "flops_unit" -``` - ----- - -### Data series - -Display several data series in the same plot and group x-axis data by specified column values. Specify an empty list if you only want one series plotted. In our sombrero example, we have two parameters. Therefore we need to either filter down to one, or make them separate series. Let's use separate series: - -Format: `[column_name, value]` -```yaml -series: [["cpus_per_task", "1"], ["cpus_per_task", "2"]] -``` -**NOTE:** Currently, only one distinct `column_name` is supported. In the future, a second one will be allowed to be added. But in any case, unlimited number of series can be plotted for the same `column_name` but different `value`. - ----- - -### Filtering - -You can filter data rows based on specified conditions. Specify an empty list for no filters. - -Format: `[column_name, operator, value]`, -Accepted operators: "==", "!=", "<", ">", "<=", ">=" -```yaml -filters: [] -``` - -**NOTE:** After re-running the benchmarks a few times your perflog will get populated with multiple lines and you'll have to filter down to what you want to plot. Feel free to experiment with a dirtier perflog file (eg. the one in `excalibur-tests/tutorial`) or a folder with several perflog files. - ----- - -### Data types - -All columns used in axes, filters, and series must have a user-specified type for the data they contain. This would be the pandas dtype, e.g. `str/string/object`, `int/int64`, `float/float64`, `datetime/datetime64`. -```yaml -column_types: - tasks: "int" - flops_value: "float" - flops_unit: "str" - cpus_per_task: "int" -``` - ----- - -### Run the postprocessing - -The postprocessing package is an optional dependency of the framework. Install it with -```bash -pip install -e ./excalibur-tests/[post-processing] -``` - -We can now run the postprocessing with -```bash -python post_processing.py -``` -where -- `` is the path to a perflog file or a directory containing perflog files. -- `` is the path to the configuration YAML file. - -In our case, -```bash -python excalibur-tests/post-processing/post_processing.py perflogs excalibur-tests/post-processing/post_processing_config.yaml -``` - ----- - -### View the Output - -`scp` over the `Performance_vs_number_of_tasks_and_CPUs_per_task.html` file created in `excalibur-tests/post-processing`, and behold! - -![](https://hackmd.io/_uploads/rkgyyJaa3.png) - - --- ## Create a Benchmark @@ -403,6 +225,8 @@ In this section you will create a ReFrame benchmark by writing a python class th For simplicity, we use the [`STREAM`](https://www.cs.virginia.edu/stream/ref.html) benchmark. It is a simple memory bandwidth benchmark with minimal build dependencies. +If you've already gone through the [ReFrame tutorial](reframe_tutorial.md), the only difference you should focus on is the [build system](excalibur-tests_tutorial.md#add-build-recipe). + ---- ### How ReFrame works @@ -467,11 +291,10 @@ Note that we did not specify a compiler. Spack will use a compiler from the spac The ReFrame class tells ReFrame where and how to run the benchmark. We want to run on one task on a full archer2 node using 128 OpenMP threads to use the full node. ```python -valid_systems = ['archer2'] +valid_systems = ['*'] valid_prog_environs = ['default'] executable = 'stream_c.exe' num_tasks = 1 -num_cpus_per_task = 128 time_limit = '5m' use_multithreading = False ``` @@ -535,48 +358,51 @@ def extract_triad_perf(self): You can now run the benchmark in the same way as the previous sombrero example -```bash -reframe -c excalibur-tests/benchmarks/apps/stream/ -r --system archer2 -J'--qos=short' -``` +=== "ARCHER2" + ```bash + reframe -c excalibur-tests/benchmarks/apps/stream/ -r --system archer2 -J'--qos=short' + ``` ---- ### Sample Output -```bash -$ reframe -c excalibur-tests/benchmarks/examples/stream/ -r -J'--qos=short' -[ReFrame Setup] - version: 4.4.1 - command: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/ciuk-demo/demo-env/bin/reframe -c excalibur-tests/benchmarks/examples/stream/ -r -J--qos=short' - launched by: tk-d193@ln03 - working directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/ciuk-demo' - settings files: '', '/work/d193/d193/tk-d193/ciuk-demo/excalibur-tests/benchmarks/reframe_config.py' - check search path: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/ciuk-demo/excalibur-tests/benchmarks/examples/stream' - stage directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/ciuk-demo/stage' - output directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/ciuk-demo/output' - log files: '/tmp/rfm-z87x4min.log' - -[==========] Running 1 check(s) -[==========] Started on Thu Nov 30 14:50:21 2023 - -[----------] start processing checks -[ RUN ] StreamBenchmark /8aeff853 @archer2:compute-node+default -[ OK ] (1/1) StreamBenchmark /8aeff853 @archer2:compute-node+default -P: Copy: 1380840.8 MB/s (r:0, l:None, u:None) -P: Scale: 1369568.7 MB/s (r:0, l:None, u:None) -P: Add: 1548666.1 MB/s (r:0, l:None, u:None) -P: Triad: 1548666.1 MB/s (r:0, l:None, u:None) -[----------] all spawned checks have finished - -[ PASSED ] Ran 1/1 test case(s) from 1 check(s) (0 failure(s), 0 skipped, 0 aborted) -[==========] Finished on Thu Nov 30 14:51:13 2023 -Log file(s) saved in '/tmp/rfm-z87x4min.log' -``` +=== "ARCHER2" + ```bash + $ reframe -c excalibur-tests/benchmarks/examples/stream/ -r -J'--qos=short' + [ReFrame Setup] + version: 4.4.1 + command: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/ciuk-demo/demo-env/bin/reframe -c excalibur-tests/benchmarks/examples/stream/ -r -J--qos=short' + launched by: tk-d193@ln03 + working directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/ciuk-demo' + settings files: '', '/work/d193/d193/tk-d193/ciuk-demo/excalibur-tests/benchmarks/reframe_config.py' + check search path: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/ciuk-demo/excalibur-tests/benchmarks/examples/stream' + stage directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/ciuk-demo/stage' + output directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/ciuk-demo/output' + log files: '/tmp/rfm-z87x4min.log' + + [==========] Running 1 check(s) + [==========] Started on Thu Nov 30 14:50:21 2023 + + [----------] start processing checks + [ RUN ] StreamBenchmark /8aeff853 @archer2:compute-node+default + [ OK ] (1/1) StreamBenchmark /8aeff853 @archer2:compute-node+default + P: Copy: 1380840.8 MB/s (r:0, l:None, u:None) + P: Scale: 1369568.7 MB/s (r:0, l:None, u:None) + P: Add: 1548666.1 MB/s (r:0, l:None, u:None) + P: Triad: 1548666.1 MB/s (r:0, l:None, u:None) + [----------] all spawned checks have finished + + [ PASSED ] Ran 1/1 test case(s) from 1 check(s) (0 failure(s), 0 skipped, 0 aborted) + [==========] Finished on Thu Nov 30 14:51:13 2023 + Log file(s) saved in '/tmp/rfm-z87x4min.log' + ``` ---- ### Interpreting STREAM results -With default compile options, STREAM uses arrays of 10 million elements. On a full ARCHER2 node, the default array size fits into cache, and the benchmark does not report the correct memory bandwidth. Therefore the numbers from this tutorial are not comparable with other, published, results. +With default compile options, STREAM uses arrays of 10 million elements. On a full node, the default array size fits into cache, and the benchmark does not report the correct memory bandwidth. +Therefore the numbers from this tutorial are not comparable with other, published, results. To avoid caching, increase the array size during build by adding e.g. `stream_array_size=64000000` to the spack spec. @@ -594,43 +420,44 @@ def __init__(self): self.spack_spec = f"stream@5.10 +openmp stream_array_size={self.array_size}" ``` -```bash -[----------] start processing checks -[ RUN ] StreamBenchmark %array_size=64000000 /bbfd0e71 @archer2:compute-node+default -[ RUN ] StreamBenchmark %array_size=32000000 /e16f9017 @archer2:compute-node+default -[ RUN ] StreamBenchmark %array_size=16000000 /abc01230 @archer2:compute-node+default -[ RUN ] StreamBenchmark %array_size=8000000 /51d83d77 @archer2:compute-node+default -[ RUN ] StreamBenchmark %array_size=4000000 /8399bc0b @archer2:compute-node+default -[ OK ] (1/5) StreamBenchmark %array_size=32000000 /e16f9017 @archer2:compute-node+default -P: Copy: 343432.5 MB/s (r:0, l:None, u:None) -P: Scale: 291065.8 MB/s (r:0, l:None, u:None) -P: Add: 275577.5 MB/s (r:0, l:None, u:None) -P: Triad: 247425.0 MB/s (r:0, l:None, u:None) -[ OK ] (2/5) StreamBenchmark %array_size=16000000 /abc01230 @archer2:compute-node+default -P: Copy: 2538396.7 MB/s (r:0, l:None, u:None) -P: Scale: 2349544.5 MB/s (r:0, l:None, u:None) -P: Add: 2912500.4 MB/s (r:0, l:None, u:None) -P: Triad: 2886402.8 MB/s (r:0, l:None, u:None) -[ OK ] (3/5) StreamBenchmark %array_size=8000000 /51d83d77 @archer2:compute-node+default -P: Copy: 1641807.1 MB/s (r:0, l:None, u:None) -P: Scale: 1362616.5 MB/s (r:0, l:None, u:None) -P: Add: 1959382.9 MB/s (r:0, l:None, u:None) -P: Triad: 1940497.3 MB/s (r:0, l:None, u:None) -[ OK ] (4/5) StreamBenchmark %array_size=64000000 /bbfd0e71 @archer2:compute-node+default -P: Copy: 255622.4 MB/s (r:0, l:None, u:None) -P: Scale: 235186.0 MB/s (r:0, l:None, u:None) -P: Add: 204853.9 MB/s (r:0, l:None, u:None) -P: Triad: 213072.2 MB/s (r:0, l:None, u:None) -[ OK ] (5/5) StreamBenchmark %array_size=4000000 /8399bc0b @archer2:compute-node+default -P: Copy: 1231355.3 MB/s (r:0, l:None, u:None) -P: Scale: 1086783.2 MB/s (r:0, l:None, u:None) -P: Add: 1519446.0 MB/s (r:0, l:None, u:None) -P: Triad: 1548666.1 MB/s (r:0, l:None, u:None) -[----------] all spawned checks have finished - -[ PASSED ] Ran 5/5 test case(s) from 5 check(s) (0 failure(s), 0 skipped, 0 aborted) -[==========] Finished on Thu Nov 30 14:34:48 2023 -``` +=== "ARCHER2" + ```bash + [----------] start processing checks + [ RUN ] StreamBenchmark %array_size=64000000 /bbfd0e71 @archer2:compute-node+default + [ RUN ] StreamBenchmark %array_size=32000000 /e16f9017 @archer2:compute-node+default + [ RUN ] StreamBenchmark %array_size=16000000 /abc01230 @archer2:compute-node+default + [ RUN ] StreamBenchmark %array_size=8000000 /51d83d77 @archer2:compute-node+default + [ RUN ] StreamBenchmark %array_size=4000000 /8399bc0b @archer2:compute-node+default + [ OK ] (1/5) StreamBenchmark %array_size=32000000 /e16f9017 @archer2:compute-node+default + P: Copy: 343432.5 MB/s (r:0, l:None, u:None) + P: Scale: 291065.8 MB/s (r:0, l:None, u:None) + P: Add: 275577.5 MB/s (r:0, l:None, u:None) + P: Triad: 247425.0 MB/s (r:0, l:None, u:None) + [ OK ] (2/5) StreamBenchmark %array_size=16000000 /abc01230 @archer2:compute-node+default + P: Copy: 2538396.7 MB/s (r:0, l:None, u:None) + P: Scale: 2349544.5 MB/s (r:0, l:None, u:None) + P: Add: 2912500.4 MB/s (r:0, l:None, u:None) + P: Triad: 2886402.8 MB/s (r:0, l:None, u:None) + [ OK ] (3/5) StreamBenchmark %array_size=8000000 /51d83d77 @archer2:compute-node+default + P: Copy: 1641807.1 MB/s (r:0, l:None, u:None) + P: Scale: 1362616.5 MB/s (r:0, l:None, u:None) + P: Add: 1959382.9 MB/s (r:0, l:None, u:None) + P: Triad: 1940497.3 MB/s (r:0, l:None, u:None) + [ OK ] (4/5) StreamBenchmark %array_size=64000000 /bbfd0e71 @archer2:compute-node+default + P: Copy: 255622.4 MB/s (r:0, l:None, u:None) + P: Scale: 235186.0 MB/s (r:0, l:None, u:None) + P: Add: 204853.9 MB/s (r:0, l:None, u:None) + P: Triad: 213072.2 MB/s (r:0, l:None, u:None) + [ OK ] (5/5) StreamBenchmark %array_size=4000000 /8399bc0b @archer2:compute-node+default + P: Copy: 1231355.3 MB/s (r:0, l:None, u:None) + P: Scale: 1086783.2 MB/s (r:0, l:None, u:None) + P: Add: 1519446.0 MB/s (r:0, l:None, u:None) + P: Triad: 1548666.1 MB/s (r:0, l:None, u:None) + [----------] all spawned checks have finished + + [ PASSED ] Ran 5/5 test case(s) from 5 check(s) (0 failure(s), 0 skipped, 0 aborted) + [==========] Finished on Thu Nov 30 14:34:48 2023 + ``` ---- @@ -638,124 +465,22 @@ P: Triad: 1548666.1 MB/s (r:0, l:None, u:None) ReFrame can automate checking that the results fall within an expected range. We can use it in our previous example of increasing the array size to avoid caching. You can set a different reference value for each `perf_key` in the performance function. For example, set the test to fail if it falls outside of +-25% of the values obtained with the largest array size. -```python -reference = { - 'archer2': { - 'Copy': (260000, -0.25, 0.25, 'MB/s'), - 'Scale': (230000, -0.25, 0.25, 'MB/s'), - 'Add': (210000, -0.25, 0.25, 'MB/s'), - 'Triad': (210000, -0.25, 0.25, 'MB/s') +=== "ARCHER2" + ```python + reference = { + 'archer2': { + 'Copy': (260000, -0.25, 0.25, 'MB/s'), + 'Scale': (230000, -0.25, 0.25, 'MB/s'), + 'Add': (210000, -0.25, 0.25, 'MB/s'), + 'Triad': (210000, -0.25, 0.25, 'MB/s') + } } -} -``` + ``` > The performance reference tuple consists of the reference value, the lower and upper thresholds expressed as fractional numbers relative to the reference value, and the unit of measurement. If any of the thresholds is not relevant, None may be used instead. Also, the units in this reference variable are entirely optional, since they were already provided through the @performance_function decorator. ---- -### Plotting STREAM benchmark output - -```yaml -title: Stream Triad Bandwidth - -x_axis: - value: "array_size" - units: - custom: null - -y_axis: - value: "Triad_value" - units: - column: "Triad_unit" - -series: [] -filters: [["test_name","==","StreamBenchmark"]] -``` - ---- - -## Portability Demo - -Having gone through the process of setting up the framework on multiple systems enables you to run benchmarks configured in the repository on those systems. As a proof of this concept, this demo shows how to run a benchmark (e.g. `hpgmg`) on a list of systems (ARCHER2, csd3, cosma8, isambard-macs). Note that to run this demo, you will need an account and a CPU time allocation on each of these systems. - ----- - -The commands to set up and run the demo are recorded in [scripts in the exaclibur-tests repository](https://github.com/ukri-excalibur/excalibur-tests/tree/tk-portability-demo/demo). It is not feasible to make the progress completely system-agnostic, in our case we need to manually - -- Load a compatible python module -- Specify the user account for charging CPU time -- Change the working directory and select quality of service (on ARCHER2) - -That is done differently on each system. The framework attempts to automtically identify the system it is being run on, but due to ambiguity in login node names this can fail, and we also recommend specifying the system on the command line. - ----- - -```bash -#!/bin/bash -l - -system=$1 - -# System specific part of setup. Mostly load the correct python module -if [ $system == archer2 ] -then - module load cray-python - cd /work/d193/d193/tk-d193 -elif [ $system == csd3 ] -then - module load python/3.8 -elif [ $system == cosma ] -then - module swap python/3.10.7 -elif [ $system == isambard ] -then - module load python37 - export PATH=/home/ri-tkoskela/.local/bin:$PATH -fi - -# Setup -mkdir demo -cd demo -python3 --version -python3 -m venv demo-env -source ./demo-env/bin/activate -git clone git@github.com:ukri-excalibur/excalibur-tests.git -git clone -c feature.manyFiles=true git@github.com:spack/spack.git -source ./spack/share/spack/setup-env.sh -export RFM_CONFIG_FILES="$(pwd)/excalibur-tests/benchmarks/reframe_config.py" -export RFM_USE_LOGIN_SHELL="true" -pip install --upgrade pip -pip install -e ./excalibur-tests -``` - ----- - -```bash -#!/bin/bash - -app=$1 -compiler=$2 -system=$3 -spec=$app\%$compiler - -apps_dir=excalibur-tests/benchmarks/apps - -if [ $system == archer2 ] -then - reframe -c $apps_dir/$app -r -J'--qos=standard' --system archer2 -S spack_spec=$spec --setvar=num_cpus_per_task=8 --setvar=num_tasks_per_node=2 --setvar=num_tasks=8 -elif [ $system == cosma ] -then - reframe -c $apps_dir/$app -r -J'--account=do006' --system cosma8 -S spack_spec=$spec --setvar=num_cpus_per_task=8 --setvar=num_tasks_per_node=2 --setvar=num_tasks=8 -elif [ $system == csd3 ] -then - reframe -c $apps_dir/$app -r -J'--account=DIRAC-DO006-CPU' --system csd3-cascadelake -S spack_spec=$spec --setvar=num_cpus_per_task=8 --setvar=num_tasks_per_node=2 --setvar=num_tasks=8 -elif [ $system == isambard ] -then - reframe -c $apps_dir/$app -r --system isambard-macs:cascadelake -S build_locally=false -S spack_spec=$spec --setvar=num_cpus_per_task=8 --setvar=num_tasks_per_node=2 --setvar=num_tasks=8 -fi -``` - ---- - ## Useful Reading ---- diff --git a/docs/tutorial/getting-started.md b/docs/tutorial/getting-started.md new file mode 100644 index 00000000..37ad5199 --- /dev/null +++ b/docs/tutorial/getting-started.md @@ -0,0 +1,50 @@ +## Getting Started on ARCHER2 + +To complete this tutorial, you need to [connect to ARCHER2 via ssh](https://docs.archer2.ac.uk/user-guide/connecting/). You will need + +1. An ARCHER2 account. You can [request a new account](https://docs.archer2.ac.uk/quick-start/quickstart-users/#request-an-account-on-archer2) if you haven't got one you can use. Use the project code `ta131` to request your account. You can use an existing ARCHER2 account to complete this workshop. +2. A command line terminal with an ssh client. Most Linux and Mac systems come with these preinstalled. Please see [Connecting to ARCHER2](https://docs.archer2.ac.uk/user-guide/connecting/#command-line-terminal) for more information and Windows instructions. + +---- + +### ssh + +Once you have the above prerequisites, you have to [generate an ssh key pair](https://docs.archer2.ac.uk/user-guide/connecting/#ssh-key-pairs) and [upload the public key to SAFE](https://docs.archer2.ac.uk/user-guide/connecting/#upload-public-part-of-key-pair-to-safe). + +When you are done, check that you are able to connect to ARCHER2 with + +```bash +ssh username@login.archer2.ac.uk +``` + +---- + +### ARCHER2 MFA + +ARCHER2 has deployed mandatory multi-factor authentication (MFA) + +SSH keys will work as before, but instead of your ARCHER2 password, a Time-based One-Time Password (TOTP) code will be requested. + +TOTP is a six digit number, refreshed every 30 seconds, which is generated typically by an app running on your mobile phone or laptop. + +Thus authentication will require two factors: + +1) SSH key and passphrase +2) TOTP + +---- + +### ARCHER2 MFA Docs and Support + +The SAFE documentation which details how to set up MFA on machine accounts (ARCHER2) is available at: +https://epcced.github.io/safe-docs/safe-for-users/#how-to-turn-on-mfa-on-your-machine-account + +The documentation includes how to set this up without the need of a personal smartphone device. + +We have also updated the ARCHER2 documentation with details of the new connection process: +https://docs.archer2.ac.uk/user-guide/connecting-totp/ +https://docs.archer2.ac.uk/quick-start/quickstart-users-totp/ + +If there are any issues or concerns please contact us at: + +support@archer2.ac.uk diff --git a/docs/tutorial/postprocessing_tutorial.md b/docs/tutorial/postprocessing_tutorial.md new file mode 100644 index 00000000..c84ecc49 --- /dev/null +++ b/docs/tutorial/postprocessing_tutorial.md @@ -0,0 +1,204 @@ +## Postprocess Benchmark Results + +Now let's browse the benchmark performance results, and create plots to visualise them. + +**NOTE:** The post-processing package is still under development. Please refer to the latest [documentation](https://ukri-excalibur.github.io/excalibur-tests/post-processing) to use it. + +---- + +### Postprocessing features + +The postprocessing can be performed either on a GUI or a CLI. It takes as input either a single perflog or a path that contains perflogs, and it is driven by a configuration YAML file (more on this later). Its outputs can be csv files of the whole or filtered perflog contents, as well as plots. + +![Screenshot from 2024-04-25 17-01-41](https://hackmd.io/_uploads/HkWxlWOZ0.png) + +We will explore its functionality features (series, filters, scaling) via the GUI first. + +---- + +### GUI Demo + +We can launch the GUI with + +`streamlit run excalibur-tests/post-processing/streamlit_post_processing.py perflogs///StreamTest.log` + +Demo: +- Start without config, explore unfiltered DataFrame +- Create a plot with axes, series, filters, scaling, and extra columns + - See how the DataFrame gets filtered/modified as values are set for those fields + - See how those fields get modified in the configuration in real time +- Export config + +Optional: +- Modify config outside the GUI +- Load it back in and generate new plot + +---- + +### The plotting configuration file + +We explored all those features in the GUI, just repeating them here for reference. + +The framework contains tools to plot the FOMs of benchmarks against any of the other parameters in the perflog. This generic plotting is driven by a configuration YAML file like the one we exported from the GUI - but can also be written from scratch. + +The file needs to include +- Plot title +- Axis information +- Optional: series, filters +- Optional: scaling +- Optional: extra columns +- Data types + +---- + +### Title and Axes + +Axes must have a value specified with a DataFrame column name, and units specified with either a DataFrame column name or a custom label (including `null`). +```yaml +title: Performance vs number of tasks and CPUs_per_task + +x_axis: + value: "arraysize" + units: + custom: null + +y_axis: + value: "Copy_value" + units: + column: "Copy_unit" +``` + +---- + +### Filters + +Those can be of two types: series and filters. + +#### Data series + +Display several data series in the same plot and group x-axis data by specified column values. Specify an empty list if you only want one series plotted. +In our STREAM example, we have two parameters. Therefore we need to either filter down to one, or make them separate series. Let's use separate series: + +Format: `[column_name, value]` +```yaml +series: [["param_cpus_per_task", "4"], ["param_cpus_per_task", "8"]] +``` +**NOTE:** Currently, only one distinct `column_name` is supported. In the future, a second one will be allowed to be added. But in any case, unlimited number of series can be plotted for the same `column_name` but different `value`. + +#### Filtering + +You can filter data rows based on specified conditions. Those can be combined in complex ways, using the "and" and "or" filter categories. Specify an empty list for no filters. + +Format: `[column_name, operator, value]`, +Accepted operators: "==", "!=", "<", ">", "<=", ">=" +```yaml +filters: + and: + - [job_completion_time, '>=', '2024-04-26 11:21:30'] + or: [] +``` + +**NOTE:** After re-running the benchmarks a few times your perflog will get populated with multiple lines and you'll have to filter down to what you want to plot. Feel free to experiment with a dirtier perflog file or a folder with several perflog files. + +---- + +### Scaling + +You can scale the y axis values in various ways. + +By a fixed number: +```yaml +y_axis: + value: "Copy_value" + units: + column: "Copy_unit" + scaling: + custom: 2 +``` + +By another column: +```yaml +y_axis: + value: "Copy_value" + units: + column: "Copy_unit" + scaling: + column: + name: "Add_value" +``` + +By one of the series: +```yaml +y_axis: + value: "Copy_value" + units: + column: "Copy_unit" + scaling: + column: + name: "Copy_value" + series: 0 +``` +where the "series" value is the index of the series (i.e. "0" means the first series, "1" the second, and so on) + +By a specific value in the column: +```yaml +y_axis: + value: "Copy_value" + units: + column: "Copy_unit" + scaling: + column: + name: "Copy_value" + series: 0 + x_value: 5 +``` + +---- + +### Data types + +All columns used in axes, filters, and series must have a user-specified type for the data they contain. This would be the pandas dtype, e.g. `str/string/object`, `int/int64`, `float/float64`, `datetime/datetime64`. +```yaml +column_types: + arraysize: "int" + Copy_value: "float" + Copy_unit: "str" + param_cpus_per_task: "int" + job_completion_time: "datetime" +``` + +---- + +### Extra columns + +If you choose to save to a csv file the filtered DataFrame for further analysis, you can include extra columns, in addition to the ones you used for plotting. Those will not affect your plot. +```yaml +extra_columns_to_csv: ["Scale_value", "Add_value", "Triad_value"] +``` + +---- + +### Run the CLI postprocessing + +Now that we have a config file, we can change it as required and run it in an automated way with new data, using the CLI: +```bash +python post_processing.py +``` +where +- `` is the path to a perflog file or a directory containing perflog files. +- `` is the path to the configuration YAML file. +- other useful flags: `-s` to save the filtered DataFrame, `-np` to skip the plotting. + +In our case, +```bash +python excalibur-tests/post-processing/post_processing.py -s perflogs/archer2/compute-node/StreamTest.log ~/Downloads/Plotyplot.yaml +``` + +---- + +### View the Output + +And behold! Inside `excalibur-tests/post-processing`, we've generated the same plot and the csv file with the data it contains, in a reproducible way! + +![bokeh_plot](https://hackmd.io/_uploads/BJbgNxgGA.png) + diff --git a/docs/tutorial/profiling_tutorial.md b/docs/tutorial/profiling_tutorial.md new file mode 100644 index 00000000..15a27290 --- /dev/null +++ b/docs/tutorial/profiling_tutorial.md @@ -0,0 +1 @@ +# Profiling tutorial diff --git a/docs/tutorial/reframe_tutorial.md b/docs/tutorial/reframe_tutorial.md index 27c40a76..c7a7a8a1 100644 --- a/docs/tutorial/reframe_tutorial.md +++ b/docs/tutorial/reframe_tutorial.md @@ -87,6 +87,7 @@ therefore it's best to clone ReFrame v4.5. There's a [Hello world example](https://reframe-hpc.readthedocs.io/en/v4.5.2/tutorial_basics.html#the-hello-world-test) in the ReFrame 4.5 tutorial that explains how to create a simple ReFrame test. ReFrame tests are python classes that describe how a test is run. +To get started, open an empty `.py` file where you will write the ReFrame class, e.g. `hello.py`. ---- @@ -271,9 +272,9 @@ perform different floating point operations on them. ---- -### Boilerplate +### Create the Test Class -The imports and the class declaration look the same as before. +The imports and the class declaration look the same as in the hello world example. We can now specify valid systems and programming environments to run on the system we just configured. You can adapt these to your system, or keep using `'*'` to run on any platform. diff --git a/mkdocs.yml b/mkdocs.yml index 9b2992c5..765c8816 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -41,8 +41,11 @@ nav: - Kathleen: systems#myriad-and-kathleen - Tursa: systems#tursa - 'Tutorials': + - Getting started: tutorial/getting-started.md - ReFrame Tutorial: tutorial/reframe_tutorial.md - - excalibur-tests Tutorial: tutorial/excalibur-tests_tutorial.md + - ExCALIBUR-tests Tutorial: tutorial/excalibur-tests_tutorial.md + - Postprocessing Tutorial: tutorial/postprocessing_tutorial.md + - Profiling Tutorial: tutorial/profiling_tutorial.md theme: name: material features: diff --git a/post-processing/.gitignore b/post-processing/.gitignore new file mode 100644 index 00000000..2672b770 --- /dev/null +++ b/post-processing/.gitignore @@ -0,0 +1,3 @@ +*.html +*.png +*.csv \ No newline at end of file From b63883d531909d745221dd21d340cd1ff81b8ad5 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Wed, 18 Dec 2024 15:01:08 +0000 Subject: [PATCH 26/32] Some refactoring of duplicate content --- .gitignore | 5 +- docs/tutorial/excalibur-tests_tutorial.md | 59 +++------------- docs/tutorial/getting-started.md | 2 +- docs/tutorial/reframe_tutorial.md | 67 +------------------ docs/tutorial/setup-python.md | 28 ++++++++ .../tutorial/stream-sanity-and-performance.md | 39 +++++++++++ mkdocs.yml | 2 + 7 files changed, 86 insertions(+), 116 deletions(-) create mode 100644 docs/tutorial/setup-python.md create mode 100644 docs/tutorial/stream-sanity-and-performance.md diff --git a/.gitignore b/.gitignore index 78fdda97..39def363 100644 --- a/.gitignore +++ b/.gitignore @@ -9,6 +9,7 @@ reframe.log reframe.out stage/ output/ +perflogs/ apps/castep/downloads/ apps/wrf/downloads/ apps/gromacs/downloads/ @@ -24,7 +25,9 @@ outdated/ #ignore virtual environment myvenv/ -perflogs/ # docs site/ + +#emacs backups +*~ \ No newline at end of file diff --git a/docs/tutorial/excalibur-tests_tutorial.md b/docs/tutorial/excalibur-tests_tutorial.md index deab5f66..3a41e3c7 100644 --- a/docs/tutorial/excalibur-tests_tutorial.md +++ b/docs/tutorial/excalibur-tests_tutorial.md @@ -16,22 +16,19 @@ In this tutorial you will set up the excalibur-tests benchmarking framework on a ---- -### Set up python +### Set up python environment -=== "ARCHER2" +{!tutorial/setup-python.md!} - We are going to use `python` and the `pip` package installer to install and run the framework. Load the `cray-python` module to get a python version that fills the requirements. - ```bash - module load cray-python - ``` - You can check with `python3 --version` that your python version is `3.8` or greater. You will have to load this module every time you login. - - (at the time of writing, the default version was `3.9.13`). +--- ----- ### Change to work directory +=== "Cosma" + + Move on to the next step. + === "ARCHER2" On ARCHER2, the compute nodes do not have access to your home directory, therefore it is important to install everything in a [work file system](https://docs.archer2.ac.uk/user-guide/data/#work-file-systems). @@ -225,7 +222,7 @@ In this section you will create a ReFrame benchmark by writing a python class th For simplicity, we use the [`STREAM`](https://www.cs.virginia.edu/stream/ref.html) benchmark. It is a simple memory bandwidth benchmark with minimal build dependencies. -If you've already gone through the [ReFrame tutorial](reframe_tutorial.md), the only difference you should focus on is the [build system](excalibur-tests_tutorial.md#add-build-recipe). +If you've already gone through the [ReFrame tutorial](reframe_tutorial.md) some of the steps in creating the STREAM benchmark are repeated. However, pay attention to the [`Create a Test Class`](excalibur-tests_tutorial.md#create-a-test-class) and [`Add Build Recipe`](excalibur-tests_tutorial.md#add-build-recipe) steps. ---- @@ -312,45 +309,7 @@ env_vars['OMP_PLACES'] = 'cores' ---- -### Add Sanity Check - -The rest of the benchmark follows the [Writing a Performance Test ReFrame Tutorial](https://reframe-hpc.readthedocs.io/en/latest/tutorial_basics.html#writing-a-performance-test). First we need a sanity check that ensures the benchmark ran successfully. A function decorated with the `@sanity_function` decorator is used by ReFrame to check that the test ran successfully. The sanity function can perform a number of checks, in this case we want to match a line of the expected standard output. - -```python -@sanity_function -def validate_solution(self): - return sn.assert_found(r'Solution Validates', self.stdout) -``` - ----- - -### Add Performance Pattern Check - -To record the performance of the benchmark, ReFrame should extract a figure of merit from the output of the test. A function decorated with the `@performance_function` decorator extracts or computes a performance metric from the test’s output. - -> In this example, we extract four performance variables, namely the memory bandwidth values for each of the “Copy”, “Scale”, “Add” and “Triad” sub-benchmarks of STREAM, where each of the performance functions use the [`extractsingle()`](https://reframe-hpc.readthedocs.io/en/latest/deferrable_functions_reference.html#reframe.utility.sanity.extractsingle) utility function. For each of the sub-benchmarks we extract the “Best Rate MB/s” column of the output (see below) and we convert that to a float. - ----- - -### Performance Pattern Check - -```python -@performance_function('MB/s', perf_key='Copy') -def extract_copy_perf(self): - return sn.extractsingle(r'Copy:\s+(\S+)\s+.*', self.stdout, 1, float) - -@performance_function('MB/s', perf_key='Scale') -def extract_scale_perf(self): - return sn.extractsingle(r'Scale:\s+(\S+)\s+.*', self.stdout, 1, float) - -@performance_function('MB/s', perf_key='Add') -def extract_add_perf(self): - return sn.extractsingle(r'Add:\s+(\S+)\s+.*', self.stdout, 1, float) - -@performance_function('MB/s', perf_key='Triad') -def extract_triad_perf(self): - return sn.extractsingle(r'Triad:\s+(\S+)\s+.*', self.stdout, 1, float) -``` +{!tutorial/stream-sanity-and-performance.md!} ---- diff --git a/docs/tutorial/getting-started.md b/docs/tutorial/getting-started.md index 37ad5199..b9d2a701 100644 --- a/docs/tutorial/getting-started.md +++ b/docs/tutorial/getting-started.md @@ -1,4 +1,4 @@ -## Getting Started on ARCHER2 +## Connecting to ARCHER2 To complete this tutorial, you need to [connect to ARCHER2 via ssh](https://docs.archer2.ac.uk/user-guide/connecting/). You will need diff --git a/docs/tutorial/reframe_tutorial.md b/docs/tutorial/reframe_tutorial.md index c7a7a8a1..54ffdc2b 100644 --- a/docs/tutorial/reframe_tutorial.md +++ b/docs/tutorial/reframe_tutorial.md @@ -29,34 +29,7 @@ You can customise the behaviour of each stage or add a hook before or after each ## Set up python environment -=== "Cosma" - - This tutorial is run on the [Cosma](https://cosma.readthedocs.io/en/latest/) supercomputer. - It should be straightforward to run on a different platform, the requirements are `gcc`, `git` and `python3`. (for the later parts you also need `make`, `autotools`, `cmake` and `spack`). - Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. - First load a newer python module. - ```bash - module swap python/3.10.12 - ``` - -=== "ARCHER2" - - This tutorial is run on ARCHER2, you should have signed up for a training account before starting. - It can be ran on other HPC systems with a batch scheduler but will require making some changes to the config. - Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. - First load the system python module. - ```bash - module load cray-python - ``` - -Then create an environment and activate it with - -```bash -python3 -m venv reframe_tutorial -source reframe_tutorial/bin/activate -``` - -You will have to activate the environment each time you login. To deactivate the environment run `deactivate`. +{!tutorial/setup-python.md!} ---- @@ -325,7 +298,7 @@ We can set environment variables in the `env_vars` dictionary. ---- -### Building +### Building the STREAM benchmark Recall the pipeline ReFrame executes when running a test. We can insert arbitrary functions between any steps in in the pipeline by decorating them with `@run_before` or `@run_after` @@ -346,41 +319,7 @@ It should be large enough to overflow all levels of cache so that there is no da ---- -### Sanity function - -Similar to before, we can check a line in stdout for validation. - -```python - @sanity_function - def validate_solution(self): - return sn.assert_found(r'Solution Validates', self.stdout) -``` - ----- - -### Add Performance Pattern Check - -To record the performance of the benchmark, ReFrame should extract a figure of merit from the output of the test. A function decorated with the `@performance_function` decorator extracts or computes a performance metric from the test’s output. - -> In this example, we extract four performance variables, namely the memory bandwidth values for each of the “Copy”, “Scale”, “Add” and “Triad” sub-benchmarks of STREAM, where each of the performance functions use the [`extractsingle()`](https://reframe-hpc.readthedocs.io/en/latest/deferrable_functions_reference.html#reframe.utility.sanity.extractsingle) utility function. For each of the sub-benchmarks we extract the “Best Rate MB/s” column of the output (see below) and we convert that to a float. - -```python -@performance_function('MB/s', perf_key='Copy') -def extract_copy_perf(self): - return sn.extractsingle(r'Copy:\s+(\S+)\s+.*', self.stdout, 1, float) - -@performance_function('MB/s', perf_key='Scale') -def extract_scale_perf(self): - return sn.extractsingle(r'Scale:\s+(\S+)\s+.*', self.stdout, 1, float) - -@performance_function('MB/s', perf_key='Add') -def extract_add_perf(self): - return sn.extractsingle(r'Add:\s+(\S+)\s+.*', self.stdout, 1, float) - -@performance_function('MB/s', perf_key='Triad') -def extract_triad_perf(self): - return sn.extractsingle(r'Triad:\s+(\S+)\s+.*', self.stdout, 1, float) -``` +{!tutorial/stream-sanity-and-performance.md!} ---- diff --git a/docs/tutorial/setup-python.md b/docs/tutorial/setup-python.md new file mode 100644 index 00000000..e0c68d29 --- /dev/null +++ b/docs/tutorial/setup-python.md @@ -0,0 +1,28 @@ +=== "Cosma" + + This tutorial is run on the [Cosma](https://cosma.readthedocs.io/en/latest/) supercomputer. + It should be straightforward to run on a different platform, the requirements are `gcc`, `git` and `python3`. (for the later parts you also need `make`, `autotools`, `cmake` and `spack`). + Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. + First load a newer python module. + ```bash + module swap python/3.10.12 + ``` + +=== "ARCHER2" + + This tutorial is run on ARCHER2, you should have signed up for a training account before starting. + It can be ran on other HPC systems with a batch scheduler but will require making some changes to the config. + Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. + First load the system python module. + ```bash + module load cray-python + ``` + +Then create an environment and activate it with + +```bash +python3 -m venv reframe_tutorial +source reframe_tutorial/bin/activate +``` + +You will have to activate the environment each time you login. To deactivate the environment run `deactivate`. diff --git a/docs/tutorial/stream-sanity-and-performance.md b/docs/tutorial/stream-sanity-and-performance.md new file mode 100644 index 00000000..0522ba88 --- /dev/null +++ b/docs/tutorial/stream-sanity-and-performance.md @@ -0,0 +1,39 @@ +### Add Sanity Check + +The rest of the benchmark follows the [Writing a Performance Test ReFrame Tutorial](https://reframe-hpc.readthedocs.io/en/latest/tutorial_basics.html#writing-a-performance-test). First we need a sanity check that ensures the benchmark ran successfully. A function decorated with the `@sanity_function` decorator is used by ReFrame to check that the test ran successfully. The sanity function can perform a number of checks, in this case we want to match a line of the expected standard output. + +```python +@sanity_function +def validate_solution(self): + return sn.assert_found(r'Solution Validates', self.stdout) +``` + +---- + +### Add Performance Pattern Check + +To record the performance of the benchmark, ReFrame should extract a figure of merit from the output of the test. A function decorated with the `@performance_function` decorator extracts or computes a performance metric from the test’s output. + +> In this example, we extract four performance variables, namely the memory bandwidth values for each of the “Copy”, “Scale”, “Add” and “Triad” sub-benchmarks of STREAM, where each of the performance functions use the [`extractsingle()`](https://reframe-hpc.readthedocs.io/en/latest/deferrable_functions_reference.html#reframe.utility.sanity.extractsingle) utility function. For each of the sub-benchmarks we extract the “Best Rate MB/s” column of the output (see below) and we convert that to a float. + +---- + +### Performance Pattern Check + +```python +@performance_function('MB/s', perf_key='Copy') +def extract_copy_perf(self): + return sn.extractsingle(r'Copy:\s+(\S+)\s+.*', self.stdout, 1, float) + +@performance_function('MB/s', perf_key='Scale') +def extract_scale_perf(self): + return sn.extractsingle(r'Scale:\s+(\S+)\s+.*', self.stdout, 1, float) + +@performance_function('MB/s', perf_key='Add') +def extract_add_perf(self): + return sn.extractsingle(r'Add:\s+(\S+)\s+.*', self.stdout, 1, float) + +@performance_function('MB/s', perf_key='Triad') +def extract_triad_perf(self): + return sn.extractsingle(r'Triad:\s+(\S+)\s+.*', self.stdout, 1, float) +``` diff --git a/mkdocs.yml b/mkdocs.yml index 765c8816..2364d0bc 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -65,6 +65,8 @@ theme: name: Switch to light mode markdown_extensions: - admonition + - markdown_include.include: + base_path: docs - pymdownx.details - pymdownx.highlight: anchor_linenums: true From 25df4516e649ed130b2485ed98001c44fdb5f3cc Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Wed, 18 Dec 2024 15:05:04 +0000 Subject: [PATCH 27/32] Add dependency --- .github/workflows/docs.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml index d7c63396..fdc3bb70 100644 --- a/.github/workflows/docs.yml +++ b/.github/workflows/docs.yml @@ -53,7 +53,7 @@ jobs: cache: 'pip' - name: Install Python dependencies - run: pip install mkdocs-material github3.py + run: pip install mkdocs-material github3.py markdown-include - name: Checkout gh-pages # Run only if push is to `main`, or if it's a PR not from a fork. From 4775a74e354d420acbd03f24863f7297c181a98f Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Wed, 18 Dec 2024 15:17:43 +0000 Subject: [PATCH 28/32] Add profiling tutorial --- docs/tutorial/profiling_tutorial.md | 71 ++++++++++++++++++- .../tutorial/stream-sanity-and-performance.md | 4 -- 2 files changed, 70 insertions(+), 5 deletions(-) diff --git a/docs/tutorial/profiling_tutorial.md b/docs/tutorial/profiling_tutorial.md index 15a27290..a9e28c46 100644 --- a/docs/tutorial/profiling_tutorial.md +++ b/docs/tutorial/profiling_tutorial.md @@ -1 +1,70 @@ -# Profiling tutorial +# Profiling tutorial -- !Work in progress! + +## Outline + +1. Running a benchmark with a profiler +2. Sampling profilers (Nsight & Vtune) +3. Collecting roofline data (advisor-roofline) + +## Profilers in excalibur-tests + +The `excalibur-tests` framework allows you to run a profiler together with a benchmark application. To do this, you can set the profiler attribute on the command line using the `-S profiler=...` syntax + +We support profilers that can be spack installed without a lincense and don't require modifying the source code or the build system. Currently supported values for the profiler attribute are: + +- `advisor-roofline`: it produces a roofline model of your program using Intel Advisor; +- `nsight`: it runs the code with the NVIDIA Nsight Systems profiler; +- `vtune`: it runs the code with the Intel VTune profiler. + +For more details, see the [User documentation](https://ukri-excalibur.github.io/excalibur-tests/use/) + +## Profiling with Nsight + +- NVIDIA Nsight Systems is a low-overhead sampling profiler that supports both CPU and GPU applications. +- Supports both x86 and ARM architectures +- We collect `nsys profile --trace=cuda,mpi,nvtx,openmp,osrt,opengl,syscall` + +Run Nsight profiling with +```bash +reframe -c path/to/application -r -S profiler=nsight +``` + +- Spack installs the `nvidia-nsight-systems` package in the background, including the GUI +- The paths to the collected profile data, and to the GUI launcher are written into `rfm_job.out` +- To run the GUI remotely, you need to login with `ssh -X`. It may be slow on a remote system. +- You can (spack) install the GUI locally to view the data. + +## Profiling with VTune + +- Intel VTune is a low-overhead sampling profiler that supports CPU applications +- Only runs on x86 architectures + +Run VTune profiling with +```bash +reframe -c path/to/application -r -S profiler=vtune +``` +- Spack installs `intel-oneapi-vtune` package in the background, including the GUI + +## Roofline analysis with Advisor + +- Intel Advisor is a tool for on-node performance optimisation. It does analysis for efficient Vectorization, Threading, Memory Usage, and Accelerator Offloading +- Since ~2018 it has had support for automated roofline analysis +- Is only supports x86 CPU architecture +- Won't run on the MPI launcher (because it does on-node analysis). In our benchmarks we have to override it. It can run inside an MPI job on a single rank but we don't currently support it, hopefully will be available in the future. + +To run on a single MPI rank without `mpirun`, add the following decorated function to the test class +```python + @run_before('run') + def replace_launcher(self): + self.job.launcher = getlauncher('local')() +``` + +- We collect `advisor -collect roofline` + +Run Advisor roofline collection with +```bash +reframe -c path/to/stream -r -S profiler=advisor-roofline +``` + +- Similar to Nsight, the GUI is installed by Spack but is slow to run remotely. + diff --git a/docs/tutorial/stream-sanity-and-performance.md b/docs/tutorial/stream-sanity-and-performance.md index 0522ba88..bef37ab6 100644 --- a/docs/tutorial/stream-sanity-and-performance.md +++ b/docs/tutorial/stream-sanity-and-performance.md @@ -16,10 +16,6 @@ To record the performance of the benchmark, ReFrame should extract a figure of m > In this example, we extract four performance variables, namely the memory bandwidth values for each of the “Copy”, “Scale”, “Add” and “Triad” sub-benchmarks of STREAM, where each of the performance functions use the [`extractsingle()`](https://reframe-hpc.readthedocs.io/en/latest/deferrable_functions_reference.html#reframe.utility.sanity.extractsingle) utility function. For each of the sub-benchmarks we extract the “Best Rate MB/s” column of the output (see below) and we convert that to a float. ----- - -### Performance Pattern Check - ```python @performance_function('MB/s', perf_key='Copy') def extract_copy_perf(self): From fcc121d6c70ac0954a69db3b278c0ac841556d55 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Wed, 18 Dec 2024 15:40:09 +0000 Subject: [PATCH 29/32] Add tabs to getting started page --- docs/tutorial/getting-started.md | 35 ++++++++++++++++++++++++-------- mkdocs.yml | 8 ++++---- 2 files changed, 30 insertions(+), 13 deletions(-) diff --git a/docs/tutorial/getting-started.md b/docs/tutorial/getting-started.md index b9d2a701..1130e5bb 100644 --- a/docs/tutorial/getting-started.md +++ b/docs/tutorial/getting-started.md @@ -1,21 +1,38 @@ -## Connecting to ARCHER2 +# Getting Started on HPC systems -To complete this tutorial, you need to [connect to ARCHER2 via ssh](https://docs.archer2.ac.uk/user-guide/connecting/). You will need +These tutorials have been ran in in-person workshops on various HPC systems in the UK. It should be possible to run on any of the [supported systems](../systems.md), or to [set up the tools](../setup.md) on a local machine. If you have access to one of the systems we've previously used, this tutorial helps you get set up. Otherwise, please consult the [documentation](../install.md) -1. An ARCHER2 account. You can [request a new account](https://docs.archer2.ac.uk/quick-start/quickstart-users/#request-an-account-on-archer2) if you haven't got one you can use. Use the project code `ta131` to request your account. You can use an existing ARCHER2 account to complete this workshop. -2. A command line terminal with an ssh client. Most Linux and Mac systems come with these preinstalled. Please see [Connecting to ARCHER2](https://docs.archer2.ac.uk/user-guide/connecting/#command-line-terminal) for more information and Windows instructions. +## Connecting + +=== "Cosma" + + To run these tutorials on Cosma, you need to [connect to Cosma via ssh](https://cosma.readthedocs.io/en/latest/ssh.html). You will need + + 1. A Cosma account. You can [request a new account](https://cosma.readthedocs.io/en/latest/account.html) if you haven't got one you can use. You can use an existing Cosma account to complete the tutorials. + 2. A command line terminal with an ssh client. Most Linux and Mac systems come with these preinstalled. Please see [Connecting to ARCHER2](https://docs.archer2.ac.uk/user-guide/connecting/#command-line-terminal) for more information and Windows instructions. + +=== "ARCHER2" + To run these tutorials on ARCHER2, you need to [connect to ARCHER2 via ssh](https://docs.archer2.ac.uk/user-guide/connecting/). You will need + + 1. An ARCHER2 account. You can [request a new account](https://docs.archer2.ac.uk/quick-start/quickstart-users/#request-an-account-on-archer2) if you haven't got one you can use. You can use an existing ARCHER2 account to complete the tutorials. + 2. A command line terminal with an ssh client. Most Linux and Mac systems come with these preinstalled. Please see [Connecting to ARCHER2](https://docs.archer2.ac.uk/user-guide/connecting/#command-line-terminal) for more information and Windows instructions. ---- ### ssh -Once you have the above prerequisites, you have to [generate an ssh key pair](https://docs.archer2.ac.uk/user-guide/connecting/#ssh-key-pairs) and [upload the public key to SAFE](https://docs.archer2.ac.uk/user-guide/connecting/#upload-public-part-of-key-pair-to-safe). +=== "Cosma" + + Please see [SSH access to Cosma](https://cosma.readthedocs.io/en/latest/ssh.html) for more information + +=== "ARCHER2" + Once you have the above prerequisites, you have to [generate an ssh key pair](https://docs.archer2.ac.uk/user-guide/connecting/#ssh-key-pairs) and [upload the public key to SAFE](https://docs.archer2.ac.uk/user-guide/connecting/#upload-public-part-of-key-pair-to-safe). -When you are done, check that you are able to connect to ARCHER2 with + When you are done, check that you are able to connect to ARCHER2 with -```bash -ssh username@login.archer2.ac.uk -``` + ```bash + ssh username@login.archer2.ac.uk + ``` ---- diff --git a/mkdocs.yml b/mkdocs.yml index 2364d0bc..abb3fc90 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -42,10 +42,10 @@ nav: - Tursa: systems#tursa - 'Tutorials': - Getting started: tutorial/getting-started.md - - ReFrame Tutorial: tutorial/reframe_tutorial.md - - ExCALIBUR-tests Tutorial: tutorial/excalibur-tests_tutorial.md - - Postprocessing Tutorial: tutorial/postprocessing_tutorial.md - - Profiling Tutorial: tutorial/profiling_tutorial.md + - Writing benchmarks in ReFrame: tutorial/reframe_tutorial.md + - Automation in excalibur-tests: tutorial/excalibur-tests_tutorial.md + - Postprocessing ReFrame output: tutorial/postprocessing_tutorial.md + - Profiling benchmarks: tutorial/profiling_tutorial.md theme: name: material features: From 6f43d065a587a371d6ac4b4f42110d140dd7ab32 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Wed, 18 Dec 2024 16:34:07 +0000 Subject: [PATCH 30/32] Add tabs to MFA section --- docs/tutorial/getting-started.md | 34 ++++++++++++++++---------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/docs/tutorial/getting-started.md b/docs/tutorial/getting-started.md index 1130e5bb..82c99c4b 100644 --- a/docs/tutorial/getting-started.md +++ b/docs/tutorial/getting-started.md @@ -36,32 +36,32 @@ These tutorials have been ran in in-person workshops on various HPC systems in t ---- -### ARCHER2 MFA +### MFA -ARCHER2 has deployed mandatory multi-factor authentication (MFA) +=== "Cosma" -SSH keys will work as before, but instead of your ARCHER2 password, a Time-based One-Time Password (TOTP) code will be requested. + Cosma does not require MFA at present -TOTP is a six digit number, refreshed every 30 seconds, which is generated typically by an app running on your mobile phone or laptop. +=== "ARCHER2" -Thus authentication will require two factors: + ARCHER2 has deployed mandatory multi-factor authentication (MFA) -1) SSH key and passphrase -2) TOTP + SSH keys will work as before, but instead of your ARCHER2 password, a Time-based One-Time Password (TOTP) code will be requested. ----- + TOTP is a six digit number, refreshed every 30 seconds, which is generated typically by an app running on your mobile phone or laptop. -### ARCHER2 MFA Docs and Support + Thus authentication will require two factors: -The SAFE documentation which details how to set up MFA on machine accounts (ARCHER2) is available at: -https://epcced.github.io/safe-docs/safe-for-users/#how-to-turn-on-mfa-on-your-machine-account + 1) SSH key and passphrase + 2) TOTP -The documentation includes how to set this up without the need of a personal smartphone device. + The SAFE documentation which details how to set up MFA on machine accounts (ARCHER2) is available at: + [https://epcced.github.io/safe-docs/safe-for-users/#how-to-turn-on-mfa-on-your-machine-account](https://epcced.github.io/safe-docs/safe-for-users/#how-to-turn-on-mfa-on-your-machine-account) -We have also updated the ARCHER2 documentation with details of the new connection process: -https://docs.archer2.ac.uk/user-guide/connecting-totp/ -https://docs.archer2.ac.uk/quick-start/quickstart-users-totp/ + The documentation includes how to set this up without the need of a personal smartphone device. -If there are any issues or concerns please contact us at: + We have also updated the ARCHER2 documentation with details of the new connection process: + [https://docs.archer2.ac.uk/user-guide/connecting-totp/](https://docs.archer2.ac.uk/user-guide/connecting-totp/) + [https://docs.archer2.ac.uk/quick-start/quickstart-users-totp/](https://docs.archer2.ac.uk/quick-start/quickstart-users-totp/) -support@archer2.ac.uk + If there are any issues or concerns please contact support@archer2.ac.uk From 78fbeaea7d4be1125ada898e4888b4819e573db9 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Fri, 31 Jan 2025 13:32:02 +0000 Subject: [PATCH 31/32] Address review comments --- docs/install.md | 5 +++++ docs/setup.md | 3 ++- docs/tutorial/getting-started.md | 8 +++++--- docs/tutorial/postprocessing_tutorial.md | 3 ++- docs/tutorial/setup-python.md | 3 ++- docs/use.md | 7 +++++++ 6 files changed, 23 insertions(+), 6 deletions(-) diff --git a/docs/install.md b/docs/install.md index 42bf10cd..c7695093 100644 --- a/docs/install.md +++ b/docs/install.md @@ -92,3 +92,8 @@ _**Note**: if you have already installed spack locally and you want to upgrade t a newer version, you might first have to clear the cache to avoid conflicts: `spack clean -m`_ +It is recommended to always have the python virtual environment active when working with `excalibur-tests`. +However, it should be noted that since `Spack` is not installed via `pip`, it will be installed outside of the +python virtual environment. Also, keep in mind that the Spack environments that are discussed in the next section +are different and independent of the python virtual environment. + diff --git a/docs/setup.md b/docs/setup.md index a90b5b9a..9c4d8963 100644 --- a/docs/setup.md +++ b/docs/setup.md @@ -81,7 +81,8 @@ configuration. The numbers you need to watch out for are: When using Spack as build system, ReFrame needs a [Spack environment](https://spack.readthedocs.io/en/latest/environments.html) to run -its tests. Follow these steps to create a Spack environment for a new system: +its tests. The Spack environment is separate and independent of the python +virtual environment. Follow these steps to create a Spack environment for a new system: #### Create the environment ```sh diff --git a/docs/tutorial/getting-started.md b/docs/tutorial/getting-started.md index 82c99c4b..69a84103 100644 --- a/docs/tutorial/getting-started.md +++ b/docs/tutorial/getting-started.md @@ -1,18 +1,20 @@ # Getting Started on HPC systems -These tutorials have been ran in in-person workshops on various HPC systems in the UK. It should be possible to run on any of the [supported systems](../systems.md), or to [set up the tools](../setup.md) on a local machine. If you have access to one of the systems we've previously used, this tutorial helps you get set up. Otherwise, please consult the [documentation](../install.md) +These tutorials have been run in in-person workshops on various HPC systems in the UK. +It should be possible to run on any of the [supported systems](../systems.md), or to [set up the tools](../setup.md) on a local machine. +If you have access to one of the systems we've previously used, this tutorial helps you get set up. Otherwise, please consult the [documentation](../install.md) ## Connecting === "Cosma" - To run these tutorials on Cosma, you need to [connect to Cosma via ssh](https://cosma.readthedocs.io/en/latest/ssh.html). You will need + To run these tutorials on Cosma, you will need to [connect to Cosma via ssh](https://cosma.readthedocs.io/en/latest/ssh.html). You will need 1. A Cosma account. You can [request a new account](https://cosma.readthedocs.io/en/latest/account.html) if you haven't got one you can use. You can use an existing Cosma account to complete the tutorials. 2. A command line terminal with an ssh client. Most Linux and Mac systems come with these preinstalled. Please see [Connecting to ARCHER2](https://docs.archer2.ac.uk/user-guide/connecting/#command-line-terminal) for more information and Windows instructions. === "ARCHER2" - To run these tutorials on ARCHER2, you need to [connect to ARCHER2 via ssh](https://docs.archer2.ac.uk/user-guide/connecting/). You will need + To run these tutorials on ARCHER2, you will need to [connect to ARCHER2 via ssh](https://docs.archer2.ac.uk/user-guide/connecting/). You will need 1. An ARCHER2 account. You can [request a new account](https://docs.archer2.ac.uk/quick-start/quickstart-users/#request-an-account-on-archer2) if you haven't got one you can use. You can use an existing ARCHER2 account to complete the tutorials. 2. A command line terminal with an ssh client. Most Linux and Mac systems come with these preinstalled. Please see [Connecting to ARCHER2](https://docs.archer2.ac.uk/user-guide/connecting/#command-line-terminal) for more information and Windows instructions. diff --git a/docs/tutorial/postprocessing_tutorial.md b/docs/tutorial/postprocessing_tutorial.md index c84ecc49..3cb6dfa9 100644 --- a/docs/tutorial/postprocessing_tutorial.md +++ b/docs/tutorial/postprocessing_tutorial.md @@ -8,7 +8,8 @@ Now let's browse the benchmark performance results, and create plots to visualis ### Postprocessing features -The postprocessing can be performed either on a GUI or a CLI. It takes as input either a single perflog or a path that contains perflogs, and it is driven by a configuration YAML file (more on this later). Its outputs can be csv files of the whole or filtered perflog contents, as well as plots. +The postprocessing can be performed either on a GUI or a CLI. It takes as input either a single perflog or a path that contains perflogs, and it is driven by a configuration YAML file (more on this later). +Its outputs can be csv files of the whole or filtered perflog contents, as well as plots. ![Screenshot from 2024-04-25 17-01-41](https://hackmd.io/_uploads/HkWxlWOZ0.png) diff --git a/docs/tutorial/setup-python.md b/docs/tutorial/setup-python.md index e0c68d29..9bf6cfc7 100644 --- a/docs/tutorial/setup-python.md +++ b/docs/tutorial/setup-python.md @@ -1,7 +1,8 @@ === "Cosma" This tutorial is run on the [Cosma](https://cosma.readthedocs.io/en/latest/) supercomputer. - It should be straightforward to run on a different platform, the requirements are `gcc`, `git` and `python3`. (for the later parts you also need `make`, `autotools`, `cmake` and `spack`). + It should be straightforward to run on a different platform, the requirements are `gcc 4.5`, `git 2.39` and `python 3.7` or later. + (for the later parts you also need `make`, `autotools`, `cmake` and `spack` but these can be installed locally). Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. First load a newer python module. ```bash diff --git a/docs/use.md b/docs/use.md index 4d89ef74..1505d3fd 100644 --- a/docs/use.md +++ b/docs/use.md @@ -75,6 +75,13 @@ reframe -c benchmarks/apps/sombrero -r --performance-report -S env_vars=OMP_PLAC runs the `benchmarks/apps/sombrero` benchmark setting the environment variable `OMP_PLACES` to `threads`. +### Output directories + +By default `reframe` creates three output directories (`stage`, `output` and `perflogs`) in the directory +where it is run. Output can be written to a different base directory using the [`--prefix` command-line option](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-prefix). + +The individual output directories can also be changed using the `--stage`, `--outputdir` and `--perflogdir` options. + ## Usage on unsupported systems The configuration provided in [`reframe_config.py`](https://github.com/ukri-excalibur/excalibur-tests/blob/main/benchmarks/reframe_config.py) lets you run the From ce213f9c6ae777163e982642d4a02add0c8503d6 Mon Sep 17 00:00:00 2001 From: Tuomas Koskela Date: Mon, 10 Feb 2025 14:40:52 +0000 Subject: [PATCH 32/32] Add section on spack cache and local data paths --- docs/setup.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/docs/setup.md b/docs/setup.md index 9c4d8963..bd8f0367 100644 --- a/docs/setup.md +++ b/docs/setup.md @@ -149,3 +149,15 @@ it to the spack environment with ```sh spack -e /path/to/environment repo add /path/to/repo ``` + +#### (optional) Override default spack cache path + +Spack also, by default, keeps various caches and user data in `~/.spack`, but users may want to override these locations. Spack provides environment variables that allow you to override or opt out of configuration locations: + +`SPACK_USER_CONFIG_PATH`: Override the path to use for the user scope (`~/.spack` by default). + +`SPACK_SYSTEM_CONFIG_PATH`: Override the path to use for the system scope (`/etc/spack` by default). + +`SPACK_USER_CACHE_PATH`: Override the default path to use for user data (misc_cache, tests, reports, etc.) + +For more details, see [spack docs](https://spack.readthedocs.io/en/latest/configuration.html#overriding-local-configuration).