Skip to content

Commit

Permalink
Merge pull request #298 from lanl/jmm/arbitrary-block-numbers
Browse files Browse the repository at this point in the history
Infrastructure Required to Support MeshPacks of arbitrary size.
  • Loading branch information
Yurlungur authored Sep 11, 2020
2 parents 20cf47f + 9a80a76 commit eb77511
Show file tree
Hide file tree
Showing 15 changed files with 446 additions and 138 deletions.
5 changes: 3 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,11 @@

### Added (new features/APIs/variables/...)
- [[PR 250]](https://github.com/lanl/parthenon/pull/250) Feature::Restart. If output file format 'rst' is specified restart files are written using independent variables and those marked with Restart metadata flag. Simulations can be restarted with a '-r \<restartFile\>' argument to the code.
- [[PR 263]](https://github.com/lanl/parthenon/pull/263) Added MeshPack, a mechanism for looping over the whole mesh at once within a `Kokkos` kernel. See [documentation](docs/mesh/packing.md)
- [[PR 263]](https://github.com/lanl/parthenon/pull/263) Added MeshBlockPack, a mechanism for looping over the whole mesh at once within a `Kokkos` kernel. See [documentation](docs/mesh/packing.md)
- [[PR 267]](https://github.com/lanl/parthenon/pull/267) Introduced TaskRegions and TaskCollections to allow for task launches on multiple blocks.
- [[PR 287]](https://github.com/lanl/parthenon/pull/287) Added machine configuration file for compile options, see [documentation](https://github.com/lanl/parthenon/blob/develop/docs/building.md#default-machine-configurations)
- [[PR 290]](https://github.com/lanl/parthenon/pull/290) Added per cycle performance output diagnostic.
- [[PR 298]](https://github.com/lanl/parthenon/pull/298) Introduced Partition, a tiny utility for partitioning STL containers. Used for MeshBlockPacks, to enable packing over a fraction of the mesh.

### Changed (changing behavior/API/variables/...)
- [\#68](https://github.com/lanl/parthenon/issues/68) Moved default `par_for` wrappers to `MeshBlock`
Expand All @@ -19,7 +20,7 @@
- [[PR 262]](https://github.com/lanl/parthenon/pull/262) Fix setting of "coverage" label in testing. Automatically applies coverage tag to all tests not containing "performance" label.
- [[PR 276]](https://github.com/lanl/parthenon/pull/276) Decrease required Python version from 3.6 to 3.5.
- [[PR 283]](https://github.com/lanl/parthenon/pull/283) Change CI to extended nightly develop tests and short push tests.
- [[PR 282]](https://github.com/lanl/parthenon/pull/282) Integrated meshpack and tasking in pi example
- [[PR 282]](https://github.com/lanl/parthenon/pull/282) Integrated MeshBlockPack and tasking in pi example
- [[PR 294]](https://github.com/lanl/parthenon/pull/294) Fix `IndexShape::GetTotal(IndexDomain)` - previously was returning opposite of expected domain result.

### Removed
Expand Down
64 changes: 58 additions & 6 deletions docs/mesh/packing.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,29 +4,68 @@
kernels that perform little work, this can be a perforamnce bottleneck
when each kernel is launched per meshblock. Parthenon therefore
provides the capability to combine variables into a single data
structure that spans all meshblocks, the `MeshPack`.
structure that spans some number of meshblocks, the `MeshBlockPack`.

## Creating a Mesh Pack
## Creating a MeshBlockPack

There are two methods for creating mesh packs, which are analogous to the `VariablePack` and `VariableFluxPack` available in [containers](../interface/containers.md).
```C++
template <typename... Args>
auto PackVariablesOnMesh(Mesh *pmesh, const std::string &container_name,
template <typename T, typename... Args>
auto PackVariablesOnMesh(T &blocks, const std::string &container_name,
Args &&... args)
```
and
```C++
auto PackVariablesAndFluxesOnMesh(Mesh *pmesh, const std::string &container_name,
template <typename T, typename... Args>
auto PackVariablesAndFluxesOnMesh(T &blocks, const std::string &container_name,
Args &&... args)
```
The former packs only the variables, the latter packs in the variables and associated flux vectors.

Here `T` can be the mesh or any standard template container that contains meshblocks.

The variatic arguments take exactly the same arguments as
`container.PackVariables` and `container.PackVariablesAndFluxes`. You
can pass in a metadata vector, a vector of names, and optionally IDs
and a map from names to indices. See
[here](../interface/containers.md) for more details.

## Packing over a piece of the mesh

Instead of packing over the whole mesh, you can pack over only a piece
of it by using the `Partition` machinery found in
`utils/partition_stl_containers.hpp`. For example, to break the mesh
into four evenly sized meshpacks, do
```C++
using parthenon::MeshBlock;
using parthenon::Partition::Partition_t;
Partition_t<MeshBlock> partitions;
parthenon::Partition::ToNPartitions(mesh->block_list, 4, partitions);
MeshBlockPack<VariablePack<Real>> packs[4];
for (int i = 0; i < partitions.size() {
packs[i] = PackVariablesOnMesh(partitions[i], "base");
}
```
To pack only the variables "var1" and "var2" in the container named "mycontainer", do:
```C++
std::vector<std::string>> vars = {"var1", "var2"};
for (int i = 0; i < partitions.size() {
packs[i] = PackVariablesOnMesh(partitions[i], "myContainer", vars);
}
```

There are two partitioning functions:
```C++
// Splits container into N equally sized partitions
template <typename Container_t, typename T>
void ToNPartitions(Container_t &container, const int N, Partition_t<T> &partitions);

// Splits container into partitions of size N
template <typename Container_t, typename T>
void ToSizeN(Container_t &container, const int N, Partition_t<T> &partitions);
```
Both functions live within the namespace `parthenon::Partition`.
## Data Layout
The Mesh Pack is indexable as a five-dimensional `Kokkos` view. The
Expand All @@ -44,7 +83,7 @@ auto var = meshpack(m,l); // Indexes into the k'th variable on the m'th MB
Real r = meshpack(m,l,k,j,i);
```

For convenience, `MeshPack` also includes the following methods and fields:
For convenience, `MeshBlockPack` also includes the following methods and fields:
```C++
// the IndexShape object describing the bounds of all blocks in the pack
IndexShape bounds = meshpack.cellbounds;
Expand All @@ -63,3 +102,16 @@ int sparse = meshpack.GetSparse(n);
// The size of the n'th dimension of the pack
int dim = meshpack.GetDim(n);
```

## Type

The types for packs are:
```C++
MeshBlockVarPack<T>
```
and
```C++
MeshBlockVarFluxPack<T>
```
which correspond to packs over meshblocks that contain just variables
or contain variables and fluxes.
81 changes: 24 additions & 57 deletions example/calculate_pi/calculate_pi.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,6 @@
// the public, perform publicly and display publicly, and to permit others to do so.
//========================================================================================

// Self Include
#include "calculate_pi.hpp"

// Standard Includes
#include <iostream>
#include <memory>
Expand All @@ -24,9 +21,11 @@
// Parthenon Includes
#include <coordinates/coordinates.hpp>
#include <kokkos_abstraction.hpp>
#include <mesh/mesh_pack.hpp>
#include <parthenon/package.hpp>

// Local Includes
#include "calculate_pi.hpp"

using namespace parthenon::package::prelude;

// This defines a "physics" package
Expand Down Expand Up @@ -81,62 +80,11 @@ std::shared_ptr<StateDescriptor> Initialize(ParameterInput *pin) {
return package;
}

TaskStatus ComputeArea(MeshBlock *pmb) {
// compute 1/r0^2 \int d^2x in_or_out(x,y) over the block's domain
auto &rc = pmb->real_containers.Get();
const IndexRange ib = pmb->cellbounds.GetBoundsI(IndexDomain::interior);
const IndexRange jb = pmb->cellbounds.GetBoundsJ(IndexDomain::interior);
const IndexRange kb = pmb->cellbounds.GetBoundsK(IndexDomain::interior);
auto &coords = pmb->coords;

ParArrayND<Real> &v = rc->Get("in_or_out").data;
Real area;
Kokkos::parallel_reduce(
"calculate_pi compute area",
Kokkos::MDRangePolicy<Kokkos::Rank<3>>(pmb->exec_space, {kb.s, jb.s, ib.s},
{kb.e + 1, jb.e + 1, ib.e + 1},
{1, 1, ib.e + 1 - ib.s}),
KOKKOS_LAMBDA(int k, int j, int i, Real &larea) {
larea += v(k, j, i) * coords.Area(parthenon::X3DIR, k, j, i);
},
area);
Kokkos::deep_copy(pmb->exec_space, v.Get(0, 0, 0, 0, 0, 0), area);

return TaskStatus::complete;
}

TaskStatus RetrieveAreas(std::vector<MeshBlock *> &blocks,
parthenon::Packages_t &packages) {
const auto &radius = packages["calculate_pi"]->Param<Real>("radius");

Real area = 0.0;
for (auto pmb : blocks) {
auto &rc = pmb->real_containers.Get();
ParArrayND<Real> v = rc->Get("in_or_out").data;
// extract area from device memory
Real block_area;
Kokkos::deep_copy(pmb->exec_space, block_area, v.Get(0, 0, 0, 0, 0, 0));
pmb->exec_space.fence(); // as the deep copy may be async
// area must be reduced by r^2 to get the block's contribution to PI
block_area /= (radius * radius);
// accumulate
area += block_area;
}

packages["calculate_pi"]->AddParam("area", area);
return TaskStatus::complete;
}

TaskStatus ComputeAreaOnMesh(std::vector<MeshBlock *> &blocks,
parthenon::Packages_t &packages) {
auto pack = parthenon::PackVariablesOnMesh(blocks, "base",
std::vector<std::string>{"in_or_out"});
TaskStatus ComputeArea(Pack_t pack, ParArrayHost<Real> areas, int i) {
const IndexRange ib = pack.cellbounds.GetBoundsI(IndexDomain::interior);
const IndexRange jb = pack.cellbounds.GetBoundsJ(IndexDomain::interior);
const IndexRange kb = pack.cellbounds.GetBoundsK(IndexDomain::interior);

const auto &radius = packages["calculate_pi"]->Param<Real>("radius");

Real area = 0.0;
using policy = Kokkos::MDRangePolicy<Kokkos::Rank<5>>;
Kokkos::parallel_reduce(
Expand All @@ -148,9 +96,28 @@ TaskStatus ComputeAreaOnMesh(std::vector<MeshBlock *> &blocks,
larea += pack(b, v, k, j, i) * pack.coords(b).Area(parthenon::X3DIR, k, j, i);
},
area);

areas(i) = area;
return TaskStatus::complete;
}

TaskStatus AccumulateAreas(ParArrayHost<Real> areas, Packages_t &packages) {
const auto &radius = packages["calculate_pi"]->Param<Real>("radius");

Real area = 0.0;
for (int i = 0; i < areas.GetSize(); i++) {
area += areas(i);
}
area /= (radius * radius);

packages["calculate_pi"]->AddParam("area", area);
#ifdef MPI_PARALLEL
Real pi_val;
MPI_Reduce(&area, &pi_val, 1, MPI_PARTHENON_REAL, MPI_SUM, 0, MPI_COMM_WORLD);
#else
Real pi_val = area;
#endif

packages["calculate_pi"]->AddParam("pi_val", pi_val);
return TaskStatus::complete;
}

Expand Down
20 changes: 11 additions & 9 deletions example/calculate_pi/calculate_pi.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,25 +18,27 @@
#include <vector>

// Parthenon Includes
#include <interface/state_descriptor.hpp>
#include <parthenon/package.hpp>

namespace calculate_pi {
using namespace parthenon::package::prelude;
using parthenon::Packages_t;
using parthenon::ParArrayHost;
using Pack_t = parthenon::MeshBlockVarPack<Real>;

// Package Callbacks
void SetInOrOut(std::shared_ptr<Container<Real>> &rc);
std::shared_ptr<StateDescriptor> Initialize(ParameterInput *pin);

// Task Implementations
// task per meshblock
parthenon::TaskStatus ComputeArea(parthenon::MeshBlock *pmb);
// Task over whole mesh, no packs
parthenon::TaskStatus RetrieveAreas(std::vector<parthenon::MeshBlock *> &blocks,
parthenon::Packages_t &packages);

// Run task on the entire mesh at once
parthenon::TaskStatus ComputeAreaOnMesh(std::vector<parthenon::MeshBlock *> &blocks,
parthenon::Packages_t &packages);
// Note pass by value here. Required for capture.
// All objects here have reference semantics, so capture by value is ok.
// TODO(JMM) A std::shared_ptr might be better.
// Computes area on a given meshpack
parthenon::TaskStatus ComputeArea(Pack_t pack, ParArrayHost<Real> areas, int i);
// Sums up areas accross packs.
parthenon::TaskStatus AccumulateAreas(ParArrayHost<Real> areas, Packages_t &packages);
} // namespace calculate_pi

#endif // EXAMPLE_CALCULATE_PI_CALCULATE_PI_HPP_
6 changes: 2 additions & 4 deletions example/calculate_pi/parthinput.example
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,8 @@ max_level = 2 # if set, limits refinement level from this crit

<Pi>
radius = 1.5
# Whether or not to use mesh-wide kernel calls
# A value of 0 is default task-based mechanism, where kernels are per meshblock
# A value of 1 makes a single global kernel call over all meshblocks on a rank
use_mesh_pack = 0
# How many meshblocks to use in a kernel. A value of -1 means use the whole mesh.
pack_size = 4

<parthenon/output0>
file_type = hdf5
Expand Down
63 changes: 29 additions & 34 deletions example/calculate_pi/pi_driver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -93,19 +93,11 @@ parthenon::DriverStatus PiDriver::Execute() {

pouts->MakeOutputs(pmesh, pinput);

// The task lists constructed depends on whether we're doing local tasking
// or a global meshpack.
// The tasks compute pi and store it in the param "pi_val"
ConstructAndExecuteTaskLists<>(this);

// Retrive and MPI reduce the area from mesh params
auto &area = pmesh->packages["calculate_pi"]->Param<Real>("area");

#ifdef MPI_PARALLEL
Real pi_val;
MPI_Reduce(&area, &pi_val, 1, MPI_PARTHENON_REAL, MPI_SUM, 0, MPI_COMM_WORLD);
#else
Real pi_val = area;
#endif
// retrieve "pi_val" and post execute.
auto &pi_val = pmesh->packages["calculate_pi"]->Param<Real>("pi_val");
pmesh->mbcnt = pmesh->nbtotal; // this is how many blocks were processed
PostExecute(pi_val);
return DriverStatus::complete;
Expand All @@ -128,33 +120,36 @@ void PiDriver::PostExecute(Real pi_val) {
Driver::PostExecute();
}

TaskCollection PiDriver::MakeTasks(std::vector<MeshBlock *> blocks) {
template <typename T>
TaskCollection PiDriver::MakeTasks(T &blocks) {
using calculate_pi::AccumulateAreas;
using calculate_pi::ComputeArea;
using calculate_pi::ComputeAreaOnMesh;
using calculate_pi::RetrieveAreas;
TaskCollection tc;
if (pinput->GetOrAddBoolean("Pi", "use_mesh_pack", false)) {
TaskRegion &tr = tc.AddRegion(1);
{
// tasks should be local per region. Be sure to scope them appropriately.
TaskID none(0);
auto get_area = tr[0].AddTask(ComputeAreaOnMesh, none, blocks, pmesh->packages);
}
} else {
// asynchronous region where area is computed per block
TaskRegion &async_region = tc.AddRegion(blocks.size());
for (int i = 0; i < blocks.size(); i++) {
TaskID none(0);
auto get_area = async_region[i].AddTask(ComputeArea, none, blocks[i]);
}
// synchronous region where the area is retrieved and accumulated
// and stored in params
TaskRegion &sync_region = tc.AddRegion(1);
{

// 1 means pack is 1 meshblock, <1 means use entire mesh
int pack_size = pinput->GetOrAddInteger("Pi", "pack_size", 1);
if (pack_size < 1) pack_size = blocks.size();

partition::Partition_t<MeshBlock> partitions;
partition::ToSizeN(blocks, pack_size, partitions);
ParArrayHost<Real> areas("areas", partitions.size());

TaskRegion &async_region = tc.AddRegion(partitions.size());
{
// asynchronous region where area is computed per mesh pack
for (int i = 0; i < partitions.size(); i++) {
TaskID none(0);
auto get_area =
sync_region[0].AddTask(RetrieveAreas, none, blocks, pmesh->packages);
auto pack = PackVariablesOnMesh(partitions[i], "base",
std::vector<std::string>{"in_or_out"});
auto get_area = async_region[i].AddTask(ComputeArea, none, pack, areas, i);
}
}
TaskRegion &sync_region = tc.AddRegion(1);
{
TaskID none(0);
auto accumulate_areas =
sync_region[0].AddTask(AccumulateAreas, none, areas, pmesh->packages);
}

return tc;
}
9 changes: 5 additions & 4 deletions example/calculate_pi/pi_driver.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,11 @@ class PiDriver : public Driver {
pin->CheckDesired("Pi", "radius");
}

/// MakeTaskList isn't a virtual routine on `Driver`, but each driver is expected to
/// implement it.
TaskList MakeTaskList(MeshBlock *pmb);
TaskCollection MakeTasks(std::vector<MeshBlock *> blocks);
/// MakeTaskList and MakeTasks aren't virtual routines on `Driver`,
// but each driver is expected to implement at least one of them.
/// TaskList MakeTaskList(MeshBlock *pmb);
template <typename T>
TaskCollection MakeTasks(T &blocks);

/// `Execute` cylces until simulation completion.
DriverStatus Execute() override;
Expand Down
Loading

0 comments on commit eb77511

Please sign in to comment.