Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

ROCm / TransferBench Public

Notifications You must be signed in to change notification settings
Fork 15
Star 38

Code
Issues
Pull requests 1
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: ROCm/TransferBench

Releases · ROCm/TransferBench

TransferBench v1.37

24 Nov 13:52

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.37

Changes

USE_SINGLE_STREAM is enabled by default now. (Disable via USE_SINGLE_STREAM=0)

Fixes

Fix unrecognized token error when XCC_PREF_TABLE is unspecified

Assets 2

Loading

All reactions

TransferBench v1.35

22 Nov 23:38

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.35

Additions

USE_FINE_GRAIN also applies to a2a preset

Assets 2

Loading

All reactions

TransferBench v1.34

07 Nov 23:37

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.34

Added

Set GPU_KERNEL=3 to default for gfx942

Assets 2

Loading

All reactions

TransferBench v1.33

30 Oct 17:42

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.33

Adding ALWAYS_VALIDATE env var to allow for validation after every iteration instead of just once at end of all iterations

Assets 2

Loading

All reactions

TransferBench v1.32

19 Oct 22:20

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.32

Modified

Increased line limit from 2048 to 32768

Assets 2

Loading

All reactions

TransferBench v1.31

17 Oct 19:40

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.31

Modified

SHOW_ITERATIONS now show XCC:CU instead of just CU ID
SHOW_ITERATIONS also printed when USE_SINGLE_STREAM=1

Assets 2

Loading

All reactions

TransferBench v1.30

16 Oct 14:22

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.30

Added

BLOCK_SIZE added to control threadblock size (Must be multiple of 64, up to 512)
BLOCK_ORDER added to control how work is ordered for GFX-executors running USE_SINGLE_STREAM=1
- 0 - Threadblocks for Transfers are ordered sequentially (Default)
- 1 - Threadblocks for Transfers are interleaved
- 2 - Threadblocks for Transfers are ordered randomly

Assets 2

Loading

All reactions

TransferBench v1.29

16 Oct 14:18

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.29

Added

a2a preset config now responds to USE_REMOTE_READ

Fixed

Race-condition during wall-clock initialization caused "inf" during single stream runs
CU numbering output after CU masking

Modified

Default number of warmups reverted to 3
Default unroll factor for gfx940/941 set to 6

Assets 2

Loading

All reactions

TransferBench v1.28

16 Oct 14:17

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.28

Added

Added A2A_DIRECT which only executes all-to-all only directly connected GPUs (on by default now)
Added average statistics for p2p and a2a benchmarks
Added USE_FINE_GRAIN for p2p benchmark.
- With older devices, p2p performance with default coarse grain device memory stops timing as soon as request sent to data fabric,
  not actually when it arrives remotely, which may artificially inflate bandwidth numbers, especially when sending small amounts of data

Modified

Modified P2P output to help distinguish between CPU / GPU devices

Fixed

Fixed Makefile target to prevent unnecessary re-compilation

Assets 2

Loading

All reactions

TransferBench v1.27

16 Oct 14:16

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

TransferBench v1.27

Added

Adding cmdline preset to allow specify simple tests on command line
E.g. ./TransferBench cmdline 64M "1 4 G0->G0->G1"
Adding environment variable HIDE_ENV, which skips printing of environment variable values
Adding environment variable CU_MASK, which allows selection of which CUs to execute on
CU_MASK is specified in CU indices (0-#CUs-1), and '-' can be used to denote ranges of values
- E.g.: CU_MASK=3-8,16 would request Transfer be executed only CUs 3,4,5,6,7,8,16
- NOTE: This is somewhat experimental and may not work on all hardware
SHOW_ITERATIONS now shows CU usage for that iteration (experimental)

Modified

Adding extra comments on commonly missing includes with details on how to install them

Fixed

CUDA compilation should work again (wall_clock64 CUDA alias was not defined)

Assets 2

Loading

All reactions

Previous 1 2 3 4 5 6 7 Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.