Full documentation for rocPRIM is available at https://codedocs.xyz/ROCmSoftwarePlatform/rocPRIM/
- Enable bfloat16 tests and reduce threshold for bfloat16
- Fix device scan limit_size feature
- Added scan size limit feature
- Add block_load_striped and block_store_striped
- Revert old Fiji workaround, because they solved the issue at compiler side
- Update README cmake minimum version number
- Unit tests may soft hang on MI200 when running in hipMallocManaged mode.
- block_histogram, device_scan unit tests failing for HIP on Windows
- ReduceEmptyInput cause random faulire with bfloat16
- Initial HIP on Windows support. See README for instructions on how to build and install.
- bfloat16 support added.
- Packaging split into a runtime package called rocprim and a development package called rocprim-devel. The development package depends on runtime. The runtime package suggests the development package for all supported OSes except CentOS 7 to aid in the transition. The suggests feature in packaging is introduced as a deprecated feature and will be removed in a future rocm release.
- As rocPRIM is a header-only library, the runtime package is an empty placeholder used to aid in the transition. This package is also a deprecated feature and will be removed in a future rocm release.
- Unit tests may soft hang on MI200 when running in hipMallocManaged mode.
- The warp_size() function is now deprecated; please switch to host_warp_size() and device_warp_size() for host and device references respectively.
- Code coverage tools build option
- Address sanitizer build option
- gfx1030 support added.
- Experimental HIP-CPU support; build using GCC/Clang/MSVC on Win/Linux. It is work in progress, many algorithms still known to fail.
- Added single tile radix sort for smaller sizes.
- Improved performance for radix sort for larger element sizes.
- The warp_size() function is now deprecated; please switch to host_warp_size() and device_warp_size() for host and device references respectively.
- Bugfix & minor performance improvement for merge_sort when input and output storage are the same.
- gfx90a support added.
- The warp_size() function is now deprecated; please switch to host_warp_size() and device_warp_size() for host and device references respectively.
- Size zero inputs are now properly handled with newer ROCm builds that no longer allow zero-size kernel grid/block dimensions
- Minimum cmake version required is now 3.10.2
- Device scan unit test currently failing due to LLVM bug.
- Texture cache iteration support has been re-enabled.
- Benchmark builds have been re-enabled.
- Unique operator no longer called on invalid elements.
- Device scan unit test currently failing due to LLVM bug.
- No new features
- Updates to DPP instructions for warp shuffle
- Benchmark builds are disabled due to compiler bug.
- Added HIP cmake dependency
- Updates to warp shuffle for gfx10
- Disable DPP functions on gfx10++
- Benchmark builds are disabled due to compiler bug.
- Fix for rocPRIM texture cache iterator
- None
- Package dependency correct to hip-rocclr
- rocPRIM texture cache iterator functionality is broken in the runtime. It will be fixed in the next release. Please use the prior release if calling this function.
- No new features
- Point release with compilation fix.
- Improved tests with fixed and random seeds for test data
- Network interface improvements with API v3
- Switched to hip-clang as default compiler
- CMake searches for rocPRIM locally first; downloads from github if local search fails