Skip to content

TransferBench v1.61.00

Latest
Compare
Choose a tag to compare
@gilbertlee-amd gilbertlee-amd released this 28 Feb 23:57
cd80b3a

v1.61.00

Added

  • Added a2a_n preset which conducts alltoall GPU-to-GPU tranfers over nearest NIC executors
  • Re-implemented GFX_BLOCK_ORDER which allows for control over how threadblocks of multiple transfers are ordered
    • 0 = sequential, 1 = interleaved, 2 = random
  • Added a2asweep preset which tries various CU/unroll options for GFX-executed all-to-all
  • Rewrite main GID index detection logic
  • Show the GID index and description in the topology table. It is helpful for debugging purposes
  • Added GFX_WORD_SIZE to allow for different packed float sizes to use for GFX kernel. Must be either 4 (default), 2 or 1

Fixed

  • Avoid build errors for CMake and Makefile if infiniband/verbs.h header is not present and disable NIC executor in such case
  • Have a priority list of which GID entry to go for instead of hardcoding choices based on underdocumented user input (such as RoCE version and IP address family)
  • Use link-local when it is the only choice (i.e. when routing information is not available beyond local link)