Version 21.03.0 (March 30, 2021)
·
10199 commits
to stable
since this release
- Build
- Cmake can build an embedded copy of GASNet as part of the Legion build
with-DLegion_EMBED_GASNet=ON
- Cmake can build an embedded copy of GASNet as part of the Legion build
- Regent
- Contains three breaking changes to the Regent calling convention:
- Reductions are now aggregated into region requirements and
sorted by the index of the first field in the field space
among the set of fields for each reduction. - Task arguments may be passed through either
args
or
local_args
for index launched tasks. (Previously Regent
only usedlocal_args
.) - Region values passed via
args
to an index-launched task may
be bogus. Instead the region requirement should be used to
obtain the original region.
- Reductions are now aggregated into region requirements and
- Support for constant time index launches. These are enabled
automatically, but can be forced on or off with__demand
or
__forbid
with__constant_time_launches
. This should
improve scalability at extreme node counts. - Support for
rescape
andremit
to generate metaprogrammed
code more easily. - Experimental support for separate compilation via
-fspeparate 1
allows Regent programs to be compiled in parts (potentially in
parallel). Note that separate compilation currently cannot be
used with Bishop and requires one of either parallel or
incremental compilation ifregentlib.start
is used (does not
apply toregentlib.saveobj
orregentlib.save_tasks
).
- Contains three breaking changes to the Regent calling convention:
- Legion
- In the control replication branch users will find a new implementaiton
of Legion's physical analysis that uses heuristics to select which
sub-trees should be used for performing the analysis. Disjoint and
complete partitions are especially helpful in aiding the runtime. - There is a new implementation of the index space math inside of the
runtime that now soundly and precisely detect congruences between
index space math operations. This fixes a long-running class of bugs
that would cause memory explosions in the physical analysis. - In the control replication branch users can now map future values into
memories the same as they do with regions. This means that future
payloads can be placed directly on devices like GPUs. Similarly, the
runtime now accepts future data from tasks that also reside in any
memory in the machine including device memories. - Both the master and control replication branches have support for
index space attach operations. - Expensive transitive reductions on traces are now computed in the
background allowing trace replays to begin replaying immediately
with only partial optimizations.
- In the control replication branch users will find a new implementaiton
- Realm
- Custom reduction operations (including Legion's built-in ones) can
provide CUDA implementations, permitting in-place reductions in
CUDA device memory - Support for CUDA managed memory (via
-ll:msize
) that is coherent for
both host and device access. Includes support for__managed__
variables (only single-GPU if using CUDA runtime hijack mode) Event::wait
may be called outside of Realm tasks, having the same
thread-blocking behavior asEvent::external_wait
- Experimental support for AMD HIP. Note that testing coverage is
incomplete, and breakages may occur in between releases. For more
details, see: #1028
- Custom reduction operations (including Legion's built-in ones) can