Support A/B Quantization in Blockscale GEMM #3343

kensclin · 2025-12-03T09:00:47Z

Proposed changes

This commit introduces support for input (A) and weight (B) quantization within the Blockscale GEMM kernel pipeline.

Motivation:
This feature is essential for high-performance inference of large language models (LLMs), as it allows us to utilize 8-bit or 4-bit data types for both activation and weight tensors. By quantizing both A and B.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

ThomasNing · 2025-12-05T01:18:12Z

example/ck_tile/38_block_scale_gemm/gemm_abquant_quantgrouped.cpp

+                                                                ck_tile::pk_int4_t,
+                                                                ck_tile::half_t,
+                                                                ck_tile::bf8_t>{});
+            using QuantGroupSize = ck_tile::QuantGroupShape<ck_tile::sequence<1, 1, 128>>;


For the quant group size, we need to have two quant group size: quant group for A and quant group for B.

Also, this example is aiming to have A for 1D and B for 2D.

ThomasNing · 2025-12-05T01:18:53Z

example/ck_tile/38_block_scale_gemm/gemm_utils.hpp

 };

+template <typename PrecType>
+struct GemmConfig_ABQuant_Prefill : public GemmConfigBase


We could directly use the config as quant prefill no need to add one more.

ThomasNing · 2025-12-05T01:25:16Z

example/ck_tile/38_block_scale_gemm/run_gemm_quant_example.inc

+                                                       has_hot_loop_v,
+                                                       tail_number_v>,
+                    ck_tile::GemmABQuantPipelineProblem<typename TypeConfig::ADataType,
+                                                        typename TypeConfig::QDataType, // For AQ


We should have AQuant and BQuant two types for it.

ThomasNing · 2025-12-05T01:27:51Z

example/ck_tile/38_block_scale_gemm/run_gemm_quant_example.inc

       GemmConfig::PreshuffleB)
    {
        throw std::runtime_error(
            "Preshuffling weight matrix is not supported for AQuant or RowColQuant");


Also add the ABQuant here?

ThomasNing · 2025-12-05T01:36:17Z

include/ck_tile/ops/gemm_quant/block/block_universal_gemm_as_bs_abquant_cr.hpp

+
+        static constexpr index_t NQPerBlock = NPerBlock / QuantGroupSize::kN;
+        static constexpr index_t KQPerBlock = KPerBlock / QuantGroupSize::kK;
+        static constexpr index_t AQPerBlock = KPerBlock / QuantGroupSize::kK;


As we are having two different quant size, we should have 1 AQKPerBlock and 1 BQKPerBlock. When AQKPerBlock corresponds with the BQKPerBlock we could merge the loop.

ThomasNing · 2025-12-05T01:42:28Z

include/ck_tile/ops/gemm_quant/block/block_universal_gemm_as_bs_abquant_cr.hpp

+                if constexpr(Traits::TransposeC) // transposed C
+                {
+                    index_t reg_offset =
+                        Traits::PreshuffleQuant ? mIter : mIter * Traits::AQPerBlock + kQScale;


We will not provide the PreshuffleQuant to A matrix as it is not the weight.

ThomasNing · 2025-12-05T01:51:25Z

include/ck_tile/ops/gemm_quant/block/block_universal_gemm_as_bs_abquant_cr.hpp

+        struct AQPicker
+        {
+            CK_TILE_DEVICE
+            AQPicker(AQBlockTensor& aq_block_tensor_) : aq_block_tensor(aq_block_tensor_)


For this part, we could also have a common function together with the include/ck_tile/ops/gemm_quant/block/block_universal_gemm_as_aquant_bs_cr.hpp

ThomasNing · 2025-12-05T02:06:16Z

include/ck_tile/ops/gemm_quant/pipeline/gemm_abquant_pipeline_ag_bg_cr_base.hpp

+    // Create DRAM tile window for AQ
+    template <typename AQDramBlockWindowTmp>
+    CK_TILE_DEVICE constexpr auto
+    GetAQDramLoadWindow(const AQDramBlockWindowTmp& aq_dram_block_window_tmp) const


Same as the a_quant pipeline so we could merge it.

ThomasNing · 2025-12-05T02:06:46Z

include/ck_tile/ops/gemm_quant/pipeline/gemm_abquant_pipeline_ag_bg_cr_base.hpp

+    // Create DRAM tile window for BQ
+    template <typename BQDramBlockWindowTmp>
+    CK_TILE_DEVICE constexpr auto
+    GetBQDramLoadWindow(const BQDramBlockWindowTmp& bq_dram_block_window_tmp) const


Same as the BQuant pipeline so we could merge it.

ThomasNing · 2025-12-05T02:10:46Z

include/ck_tile/ops/gemm_quant/pipeline/gemm_abquant_pipeline_ag_bg_cr_policy.hpp

+
+namespace ck_tile {
+
+struct GemmABQuantPipelineAgBgCrDefaultPolicy : public UniversalGemmPipelineAgBgCrPolicy


We could share the policy with AQuant and BQuant.

Support A/B Quantization in Blockscale GEMM

34a30c7

kensclin requested review from ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent and vidyasagar-amd as code owners December 3, 2025 09:00

kensclin added 4 commits December 3, 2025 09:05

Support A/B Quantization in Blockscale GEMM

ca1370c

Support A/B Quantization in Blockscale GEMM

f0da8a4

Support A/B Quantization in Blockscale GEMM

a372997

Support A/B Quantization in Blockscale GEMM

cc01d72

ThomasNing requested changes Dec 5, 2025

View reviewed changes

kensclin added 4 commits December 9, 2025 10:14

Implement review suggested changes

94496c1

Implement review suggested changes

6d0b4ea

Merge branch 'develop' into ck_tile/gemm_blockscale_abquant

ddfea2a

Sync with develop

4f207de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support A/B Quantization in Blockscale GEMM #3343

Support A/B Quantization in Blockscale GEMM #3343

kensclin commented Dec 3, 2025 •

edited

Loading

Uh oh!

ThomasNing Dec 5, 2025

Uh oh!

ThomasNing Dec 5, 2025

Uh oh!

ThomasNing Dec 5, 2025

Uh oh!

ThomasNing Dec 5, 2025

Uh oh!

ThomasNing Dec 5, 2025

Uh oh!

ThomasNing Dec 5, 2025

Uh oh!

ThomasNing Dec 5, 2025

Uh oh!

ThomasNing Dec 5, 2025

Uh oh!

ThomasNing Dec 5, 2025

Uh oh!

ThomasNing Dec 5, 2025

Uh oh!

ThomasNing Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		namespace ck_tile {

		struct GemmABQuantPipelineAgBgCrDefaultPolicy : public UniversalGemmPipelineAgBgCrPolicy

Support A/B Quantization in Blockscale GEMM #3343

Are you sure you want to change the base?

Support A/B Quantization in Blockscale GEMM #3343

Conversation

kensclin commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Discussion

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kensclin commented Dec 3, 2025 •

edited

Loading