Skip to content

hipBLAS-0.32.0 for ROCm 3.7.0

Compare
Choose a tag to compare
@saadrahim saadrahim released this 15 Aug 04:26
abd7261

New Features

  • Improvements to rocblas_Xgemm_batched performance for small m, n, k.
  • Improvements to rocblas_Xgemv_batched and rocblas_Xgemv_strided_batched performance for small m (QMCPACK use).
  • Improvements to rocblas_Xdot (batched and non-batched) performance when both incx and incy are 1
  • FP32 ONNX BERT MI50 performance improved 28%
  • FP32 BDAS MI50/MI60 Performance improved significantly
  • Added substitution method for small trsm sizes with m <= 64 && n <= 64. Increases performance drastically for small batched trsm.
  • Add Fortran interface for BLAS 1, BLAS 2, BLAS 3
  • Add tbsv, tbsv_batched, and tbsv_strided_batched
  • Add hemm, hemm_batched, and hemm_strided_batched
  • Add symm, symm_batched, and symm_strided_batched
  • Add complex versions of geam, along with geam_batched and geam_strided_batched
  • Add gemm_batched_ex and gemm_strided_batched_ex

Known Issues

  • None