Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

ROCm / TransferBench Public

Notifications You must be signed in to change notification settings
Fork 15
Star 39

Code
Issues
Pull requests 1
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: ROCm/TransferBench

Releases · ROCm/TransferBench

TransferBench v1.61.00

28 Feb 23:57

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.61.00 Latest

Latest

v1.61.00

Added

Added a2a_n preset which conducts alltoall GPU-to-GPU tranfers over nearest NIC executors
Re-implemented GFX_BLOCK_ORDER which allows for control over how threadblocks of multiple transfers are ordered
- 0 = sequential, 1 = interleaved, 2 = random
Added a2asweep preset which tries various CU/unroll options for GFX-executed all-to-all
Rewrite main GID index detection logic
Show the GID index and description in the topology table. It is helpful for debugging purposes
Added GFX_WORD_SIZE to allow for different packed float sizes to use for GFX kernel. Must be either 4 (default), 2 or 1

Fixed

Avoid build errors for CMake and Makefile if infiniband/verbs.h header is not present and disable NIC executor in such case
Have a priority list of which GID entry to go for instead of hardcoding choices based on underdocumented user input (such as RoCE version and IP address family)
Use link-local when it is the only choice (i.e. when routing information is not available beyond local link)

Assets 2

Loading

All reactions

rocm-6.3.3

19 Feb 17:46

rocm-ci

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

rocm-6.3.3

ROCm release v6.3.3

Assets 2

Loading

All reactions

TransferBench v1.60.00

30 Jan 19:24

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.60.00

v1.60.00

Modified

Reverted GFX_SINGLE_TEAM default back to 1

Fixed

Fixed bug where peer memory access was not enabled for DMA transfers, which would break specific DMA engine transfers

Assets 2

Loading

All reactions

rocm-6.3.2

28 Jan 15:43

rocm-ci

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

rocm-6.3.2

ROCm release v6.3.2

Assets 2

Loading

All reactions

TransferBench v1.59.01

24 Jan 20:15

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.59.01

v1.59.01

Added

The a2a preset A2A_MODE variable has been enhanced to allow for customizing the number of srcs/dsts to use
This is specified by setting A2A_MODE to numSrcs:numDsts. Extra destinations past 1 will be "local" writes (i.e. if one sets A2A_MODE=1:3, then transfers will follow this pattern: Fx Gx FyFxFx) to simulate similar conditions normally used during collective algorithms such as ring-based AllReduce

Assets 2

Loading

All reactions

TransferBench v1.59.00

21 Jan 19:40

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.59.00

v1.59.00

Added

Adding in support for NIC executor, which allows for RDMA copies on NICs that support IBVerbs
By default, NIC executor will be enabled if IBVerbs is found in the dynamic linker cache
NIC executor can be indexed in two methods
- "I" Ix.y will use NIC x as the source and NIC y as the destination.
  E.g. (G0 I0.5 G4)
- "N" Nx.y will use NIC closest to GPU x as source, and NIC closest to GPU y as destination
  E.g. (G0 N0.4 N4)
The closest NIC can be overridden by the environment variable CLOSEST_NIC, which should be a comma-separated
list of NIC indices to use for the corresponding GPU
This feature can be explicitly disabled at compile time by specifying DISABLE_NIC_EXEC=1

Modified

Changing default data size to 256M from 64M
Adding NUM_QUEUE_PAIRS which enables NIC traffic in A2A. Each GPU will talk to the next GPU via the closest NIC
Sweep preset now saves last sweep run configuration to /tmp/lastSweep.cfg and can be changed via SWEEP_FILE

Fixed

Fixed bug with reporting when using subiterations
Fixed bug with per-Transfer data size specification
Fixed bug when using XCC prefered table

Assets 2

Loading

All reactions

rocm-6.3.1

20 Dec 16:12

rocm-ci

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

rocm-6.3.1

ROCm release v6.3.1

Assets 2

Loading

All reactions

TransferBench v1.58.00

05 Dec 20:46

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.58.00

v1.58.00

Fixed

Fixed broken specific DMA-engine copies

Assets 2

Loading

All reactions

rocm-6.3.0

03 Dec 19:49

rocm-ci

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

rocm-6.3.0

ROCm release v6.3.0

Assets 2

Loading

All reactions

TransferBench v1.57.01

02 Dec 23:22

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.57.01

v1.57.01

Added

Re-added "scaling" GPU GFX preset benchmark, which tests copies from GPU to other devices using varying
number of CUs.

Assets 2

Loading

All reactions

Previous 1 2 3 4 5 6 7 Next

Previous Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.