Skip to content

Conversation

@diptorupd
Copy link
Collaborator

@diptorupd diptorupd commented Oct 2, 2025

The PR changes how the HIP C++ tests in libflashinfer/tests/hip are executed.

  • The CMake infrastructure for the testing is now fully integrated with the project-level CMake.
  • Since upstream has removed CUDA C++ unit tests, I went ahead and removed the CUDA configuration path from libflashinfer/tests/CMakeLists.txt.
  • Replaced FetchContent with find_package for discovering the gtest CMake package.
  • Made AOT kernel source generation contingent on FLASHINFER_BUILD_KERNELS CMake flag and removed the kernel generation dependency on C++ unit tests. The HIP unit tests do not require AOT kernels to be generated.
  • Now all new C++ HIP unit tests can be added using the configure_flashinfer_target function. Tests now are automatically configured once added to the tests/hip directory and require no changes to the CMakeLists.txt
  • The PR also makes FLASHINFER_ENABLE_HIP the default from FLASHINFER_ENABLE_CUDA.

Usage after the changes.

mkdir build && cd build
cmake -DFLASHINFER_ENABLE_HIP=ON -DFLASHINFER_UNITTESTS=ON  -GNinja ..
ninja build_tests
ctest --progress --output-on-failure -j12

Copilot AI review requested due to automatic review settings October 2, 2025 21:20

This comment was marked as outdated.

@demandal25 demandal25 self-requested a review October 2, 2025 22:49
@diptorupd diptorupd requested a review from rtmadduri October 3, 2025 02:16
@diptorupd diptorupd marked this pull request as draft October 3, 2025 02:39
@diptorupd diptorupd force-pushed the feature/integrate_hip_cxx_tests branch from c93c75a to a7666ec Compare October 3, 2025 03:29
@diptorupd diptorupd requested a review from demandal25 October 3, 2025 03:30
@diptorupd diptorupd marked this pull request as ready for review October 3, 2025 03:30
Copilot AI review requested due to automatic review settings October 3, 2025 03:30
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

cmake/utils/ConfigureTargets.cmake:1

  • Removing the dispatch_inc dependency without replacement may cause build failures if tests depend on generated dispatch headers.
# cmake-format: off

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@diptorupd diptorupd requested a review from Copilot October 3, 2025 03:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copilot AI review requested due to automatic review settings October 3, 2025 03:33
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (3)

cmake/utils/ConfigureTargets.cmake:1

  • Removing the dispatch_inc dependency may cause build failures if targets require generated dispatch headers. Verify that this dependency is handled elsewhere or add conditional logic to preserve it when needed.
# cmake-format: off

libflashinfer/tests/CMakeLists.txt:1

  • Removing all test configurations eliminates test coverage for core functionality including decode, prefill, cascade, and other critical components. The HIP tests alone may not provide adequate coverage.
# Set global paths and initialize test list

libflashinfer/tests/CMakeLists.txt:1

  • Removing all test configurations eliminates test coverage for core functionality including decode, prefill, cascade, and other critical components. The HIP tests alone may not provide adequate coverage.
# Set global paths and initialize test list

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@diptorupd diptorupd force-pushed the feature/integrate_hip_cxx_tests branch from 268ac40 to a33cad1 Compare October 3, 2025 03:42
@diptorupd diptorupd requested a review from Copilot October 3, 2025 03:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

cmake/utils/ConfigureTargets.cmake:1

  • Removing the dispatch_inc dependency for all targets may cause build failures if some targets still require it. Consider making this conditional based on whether kernels are being built.
# cmake-format: off

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


# === HIP/ROCm OPTIONS ===
flashinfer_option(FLASHINFER_ENABLE_HIP "Enable AMD HIP/ROCm backend" OFF)
flashinfer_option(FLASHINFER_ENABLE_HIP "Enable AMD HIP/ROCm backend" ON)
Copy link

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default HIP backend from OFF to ON may break existing builds that don't have HIP dependencies installed. Consider keeping the original default or documenting this breaking change.

Suggested change
flashinfer_option(FLASHINFER_ENABLE_HIP "Enable AMD HIP/ROCm backend" ON)
flashinfer_option(FLASHINFER_ENABLE_HIP "Enable AMD HIP/ROCm backend" OFF)

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings October 3, 2025 15:00
@diptorupd diptorupd force-pushed the feature/integrate_hip_cxx_tests branch from 60aeed3 to 747b12e Compare October 3, 2025 15:00
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

# cmake-format: off
# Compiler flags - defined as lists for cleaner management
set(WARNING_FLAGS
"-Wall"
Copy link

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing -Wextra reduces compiler warning coverage. This change appears unrelated to the HIP CMake infrastructure update and should be justified or reverted.

Suggested change
"-Wall"
"-Wall"
"-Wextra"

Copilot uses AI. Check for mistakes.
@rtmadduri
Copy link
Collaborator

Currently the platform team runs tests in a certain way. Even our Dockerfile runs CMake based on the old infra.

If we make changes to the C++ test infra, we need to coordinate the changes with platform team. Right now this Dockerfile is how they release artifacts.

@rtmadduri
Copy link
Collaborator

I have approved the PR. Lets make a note to change the Dockerfile.rocm_ci

@demandal25
Copy link
Collaborator

The following tests are failing

image

@diptorupd diptorupd merged commit 63291be into amd-integration Oct 3, 2025
1 check passed
@diptorupd diptorupd deleted the feature/integrate_hip_cxx_tests branch October 3, 2025 18:32
diptorupd pushed a commit that referenced this pull request Oct 27, 2025
This PR adds chunking logic and enables the shared memory optimization
feature for Decode for the CDNA3 architecture.

The major addition of the PR is rewriting the shared memory calculation
and chunking to better suit the CDNA3 architecture which only allows
64KiB of shared memory per CU.

The PR makes corresponding changes to `test_batch_decode_kernels_hip.py`
and `examples/test_batch_decode_example.py`

`examples/test_batch_decode_example.py`
```
JIT: Using prebuilt ops
PASS
```

`test_batch_decode_kernels_hip.py`
```
================================= 720 passed in 74.37s (0:01:14) ================================= 
```
Complete HIP PyTest suite
```
=================================  16388 passed, 18 skipped in 148.54s (0:02:28) ================================= 
```
C++ test suite
```
89% tests passed, 3 tests failed out of 27

Total Test time (real) = 259.46 sec

The following tests FAILED:
          3 - FlashInferCorrectnessTest.VariableLengthMergeKernelCorrectnessTestFP16 (Failed)
         20 - MfmaRowSumTest.CorrectResults (Failed)
         27 - test_rowsum_hip (Failed)
```

Note: See
[here](#10 (comment))
for more info about the above known failures

Improvement over the existing implementation:
```
num_qo_heads = 32
kv_len = 8196
num_kv_heads = 32
head_dim = 128
num_iter = 500

Average time per iteration in seconds:

Current Flashinfer + ROCm Decode MI325: 6.3011
Shared memory Optimization Decode MI325 (This PR): 0.11113595962524414
Upstream Flashinfer v0.2.5 Decode H100: 0.09272098541259766
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants