Skip to content

Version 21.03.0 (March 30, 2021)

Compare
Choose a tag to compare
@streichler streichler released this 30 Mar 20:42
· 10199 commits to stable since this release
  • Build
    • Cmake can build an embedded copy of GASNet as part of the Legion build
      with -DLegion_EMBED_GASNet=ON
  • Regent
    • Contains three breaking changes to the Regent calling convention:
      • Reductions are now aggregated into region requirements and
        sorted by the index of the first field in the field space
        among the set of fields for each reduction.
      • Task arguments may be passed through either args or
        local_args for index launched tasks. (Previously Regent
        only used local_args.)
      • Region values passed via args to an index-launched task may
        be bogus. Instead the region requirement should be used to
        obtain the original region.
    • Support for constant time index launches. These are enabled
      automatically, but can be forced on or off with __demand or
      __forbid with __constant_time_launches. This should
      improve scalability at extreme node counts.
    • Support for rescape and remit to generate metaprogrammed
      code more easily.
    • Experimental support for separate compilation via -fspeparate 1
      allows Regent programs to be compiled in parts (potentially in
      parallel). Note that separate compilation currently cannot be
      used with Bishop and requires one of either parallel or
      incremental compilation if regentlib.start is used (does not
      apply to regentlib.saveobj or regentlib.save_tasks).
  • Legion
    • In the control replication branch users will find a new implementaiton
      of Legion's physical analysis that uses heuristics to select which
      sub-trees should be used for performing the analysis. Disjoint and
      complete partitions are especially helpful in aiding the runtime.
    • There is a new implementation of the index space math inside of the
      runtime that now soundly and precisely detect congruences between
      index space math operations. This fixes a long-running class of bugs
      that would cause memory explosions in the physical analysis.
    • In the control replication branch users can now map future values into
      memories the same as they do with regions. This means that future
      payloads can be placed directly on devices like GPUs. Similarly, the
      runtime now accepts future data from tasks that also reside in any
      memory in the machine including device memories.
    • Both the master and control replication branches have support for
      index space attach operations.
    • Expensive transitive reductions on traces are now computed in the
      background allowing trace replays to begin replaying immediately
      with only partial optimizations.
  • Realm
    • Custom reduction operations (including Legion's built-in ones) can
      provide CUDA implementations, permitting in-place reductions in
      CUDA device memory
    • Support for CUDA managed memory (via -ll:msize) that is coherent for
      both host and device access. Includes support for __managed__
      variables (only single-GPU if using CUDA runtime hijack mode)
    • Event::wait may be called outside of Realm tasks, having the same
      thread-blocking behavior as Event::external_wait
    • Experimental support for AMD HIP. Note that testing coverage is
      incomplete, and breakages may occur in between releases. For more
      details, see: #1028