Skip to content

Conversation

@bkryu
Copy link
Collaborator

@bkryu bkryu commented Oct 24, 2025

📌 Description

Current microbenchmark code does not provides instantiated block_tables to all backends. The omission had no impact to correctness or perf because page tables are instantiated linearly when not provided, but will manifest as mismatches if it is shuffled.

The current PR simply calls the FlashInfer APIs in their intended way.

No changes to library code

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • Refactor
    • Enhanced consistency in attention computation by aligning page-table parameter handling across different inference backend implementations for improved paged key-value cache operations.

@bkryu bkryu marked this pull request as ready for review October 24, 2025 17:26
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 24, 2025

Walkthrough

The pull request extends paged attention wrappers to accept and propagate block table parameters throughout the execution pipeline. BatchDecodeWithPagedKVCacheWrapper and BatchPrefillWithPagedKVCacheWrapper now receive block_tables in their planning stages, while BatchMLAPagedAttentionWrapper passes page_table during kernel execution, unifying page table handling across attention backends.

Changes

Cohort / File(s) Summary
Paged Attention Wrapper Parameter Propagation
benchmarks/routines/attention.py
Added block_tables parameter to BatchDecodeWithPagedKVCacheWrapper.plan() and BatchPrefillWithPagedKVCacheWrapper.plan() calls; extended BatchMLAPagedAttentionWrapper.run() invocation to pass page_table=block_tables for kernel execution

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested reviewers

  • yzh119
  • nvmbreughe
  • PerkzZheng
  • aleozlx
  • joker-eph

Poem

🐰 Hop, hop, through the pages we go,
Block tables flow where the attention winds blow,
Paged kernels aligned, from decode to prefill,
A rabbit's delight—unified thrills!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The title "fix: Make attention microbenchmark correctly use page table in" is clearly related to the main changeset, which adds block_tables parameters to multiple wrapper methods to fix the attention microbenchmark's page table usage. The intent is understandable from context, and the title does identify the core fix being applied. However, the title appears incomplete—it ends awkwardly with "in" without finishing the thought, making it grammatically incorrect and ambiguous as written. While the general topic is evident, the incomplete phrasing reduces clarity about the full scope and intention. Complete the title to make it grammatically correct and fully convey the intent. Consider revising it to something like "fix: Make attention microbenchmark correctly use page table in all backends" or "fix: Pass block_tables to attention microbenchmark wrappers" to clearly communicate what is being fixed and why, without ambiguity or incomplete phrasing.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed The pull request description follows the provided template with all major sections present and appropriately completed. The 📌 Description section clearly explains what the PR does and why it was needed, noting that the microbenchmark was not providing instantiated block_tables and that this PR corrects the API usage without modifying library code. The 🔍 Related Issues section is present (though empty, which is acceptable when no related issues exist). The 🚀 Pull Request Checklist is complete with all pre-commit checks and test items marked as completed, and the optional Reviewer Notes section is included. The description provides sufficient context for understanding the change.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@bkryu bkryu changed the title fix: Make attention microbenchmark correctly use page table in fix: Make attention microbenchmark correctly use page table Oct 24, 2025
@yzh119 yzh119 merged commit a565294 into flashinfer-ai:main Oct 24, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants