- 
                Notifications
    
You must be signed in to change notification settings  - Fork 4
 
Update CMake infra to run HIP CXX tests using top-level cmake #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c93c75a    to
    a7666ec      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
cmake/utils/ConfigureTargets.cmake:1
- Removing the dispatch_inc dependency without replacement may cause build failures if tests depend on generated dispatch headers.
 
# cmake-format: off
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (3)
cmake/utils/ConfigureTargets.cmake:1
- Removing the dispatch_inc dependency may cause build failures if targets require generated dispatch headers. Verify that this dependency is handled elsewhere or add conditional logic to preserve it when needed.
 
# cmake-format: off
libflashinfer/tests/CMakeLists.txt:1
- Removing all test configurations eliminates test coverage for core functionality including decode, prefill, cascade, and other critical components. The HIP tests alone may not provide adequate coverage.
 
# Set global paths and initialize test list
libflashinfer/tests/CMakeLists.txt:1
- Removing all test configurations eliminates test coverage for core functionality including decode, prefill, cascade, and other critical components. The HIP tests alone may not provide adequate coverage.
 
# Set global paths and initialize test list
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
268ac40    to
    a33cad1      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
cmake/utils/ConfigureTargets.cmake:1
- Removing the dispatch_inc dependency for all targets may cause build failures if some targets still require it. Consider making this conditional based on whether kernels are being built.
 
# cmake-format: off
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| 
               | 
          ||
| # === HIP/ROCm OPTIONS === | ||
| flashinfer_option(FLASHINFER_ENABLE_HIP "Enable AMD HIP/ROCm backend" OFF) | ||
| flashinfer_option(FLASHINFER_ENABLE_HIP "Enable AMD HIP/ROCm backend" ON) | 
    
      
    
      Copilot
AI
    
    
    
      Oct 3, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the default HIP backend from OFF to ON may break existing builds that don't have HIP dependencies installed. Consider keeping the original default or documenting this breaking change.
| flashinfer_option(FLASHINFER_ENABLE_HIP "Enable AMD HIP/ROCm backend" ON) | |
| flashinfer_option(FLASHINFER_ENABLE_HIP "Enable AMD HIP/ROCm backend" OFF) | 
60aeed3    to
    747b12e      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| # cmake-format: off | ||
| # Compiler flags - defined as lists for cleaner management | ||
| set(WARNING_FLAGS | ||
| "-Wall" | 
    
      
    
      Copilot
AI
    
    
    
      Oct 3, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing -Wextra reduces compiler warning coverage. This change appears unrelated to the HIP CMake infrastructure update and should be justified or reverted.
| "-Wall" | |
| "-Wall" | |
| "-Wextra" | 
| 
           Currently the platform team runs tests in a certain way. Even our Dockerfile runs CMake based on the old infra. If we make changes to the C++ test infra, we need to coordinate the changes with platform team. Right now this Dockerfile is how they release artifacts.  | 
    
| 
           I have approved the PR. Lets make a note to change the Dockerfile.rocm_ci  | 
    
This PR adds chunking logic and enables the shared memory optimization
feature for Decode for the CDNA3 architecture.
The major addition of the PR is rewriting the shared memory calculation
and chunking to better suit the CDNA3 architecture which only allows
64KiB of shared memory per CU.
The PR makes corresponding changes to `test_batch_decode_kernels_hip.py`
and `examples/test_batch_decode_example.py`
`examples/test_batch_decode_example.py`
```
JIT: Using prebuilt ops
PASS
```
`test_batch_decode_kernels_hip.py`
```
================================= 720 passed in 74.37s (0:01:14) ================================= 
```
Complete HIP PyTest suite
```
=================================  16388 passed, 18 skipped in 148.54s (0:02:28) ================================= 
```
C++ test suite
```
89% tests passed, 3 tests failed out of 27
Total Test time (real) = 259.46 sec
The following tests FAILED:
          3 - FlashInferCorrectnessTest.VariableLengthMergeKernelCorrectnessTestFP16 (Failed)
         20 - MfmaRowSumTest.CorrectResults (Failed)
         27 - test_rowsum_hip (Failed)
```
Note: See
[here](#10 (comment))
for more info about the above known failures
Improvement over the existing implementation:
```
num_qo_heads = 32
kv_len = 8196
num_kv_heads = 32
head_dim = 128
num_iter = 500
Average time per iteration in seconds:
Current Flashinfer + ROCm Decode MI325: 6.3011
Shared memory Optimization Decode MI325 (This PR): 0.11113595962524414
Upstream Flashinfer v0.2.5 Decode H100: 0.09272098541259766
```
    
The PR changes how the HIP C++ tests in
libflashinfer/tests/hipare executed.libflashinfer/tests/CMakeLists.txt.FetchContentwithfind_packagefor discovering thegtestCMake package.FLASHINFER_BUILD_KERNELSCMake flag and removed the kernel generation dependency on C++ unit tests. The HIP unit tests do not require AOT kernels to be generated.Now all new C++ HIP unit tests can be added using theTests now are automatically configured once added to theconfigure_flashinfer_targetfunction.tests/hipdirectory and require no changes to the CMakeLists.txtFLASHINFER_ENABLE_HIPthe default fromFLASHINFER_ENABLE_CUDA.Usage after the changes.