Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 31 additions & 4 deletions src/quantization.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@
#include <migraphx/simplify_reshapes.hpp>
#include <migraphx/simplify_qdq.hpp>
#include <migraphx/eliminate_common_subexpression.hpp>
#include <migraphx/split_single_dyn_dim.hpp>
#include <migraphx/simplify_dyn_ops.hpp>
#include <migraphx/optimize_module.hpp>
#include <migraphx/dead_code_elimination.hpp>
#include <migraphx/program.hpp>
Expand Down Expand Up @@ -68,7 +70,11 @@ static tracer quant_tracer()
void quantize_fp16(program& prog, const std::vector<std::string>& ins_names)
{
run_passes(prog,
{normalize_ops{},
{split_single_dyn_dim{},
dead_code_elimination{},
simplify_dyn_ops{},
dead_code_elimination{},
normalize_ops{},
Comment on lines +73 to +77
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pass sequence {split_single_dyn_dim{}, dead_code_elimination{}, simplify_dyn_ops{}, dead_code_elimination{}} is repeated four times in this file (lines 73-76, 87-90, 107-110, 211-214). Consider extracting this into a named constant or helper function to improve maintainability:

static std::vector<pass> get_dyn_shape_preprocessing_passes() {
    return {split_single_dyn_dim{},
            dead_code_elimination{},
            simplify_dyn_ops{},
            dead_code_elimination{}};
}

This would make the code DRYer and ensure consistent preprocessing across all quantization functions.

Copilot uses AI. Check for mistakes.
optimize_module{{"quantizelinear", "dequantizelinear"}},
truncate_float_pass{ins_names, shape::half_type},
optimize_module{{"quantizelinear", "dequantizelinear"}}},
Comment on lines +73 to 80
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes add support for dynamic shapes with single dynamic dimensions in quantization passes, but there are no tests verifying this new functionality. Consider adding tests that combine dynamic shapes with quantization to ensure the new behavior works correctly, such as:

  • Test quantize_fp16 with a dynamic shape input
  • Test quantize_bf16 with a dynamic shape input
  • Test quantize_int8 with a dynamic shape input
  • Test quantize_int4_weights with a dynamic shape input

This is especially important given that the PR description mentions "The matchers will error out otherwise" without this change.

Copilot uses AI. Check for mistakes.
Expand All @@ -78,7 +84,11 @@ void quantize_fp16(program& prog, const std::vector<std::string>& ins_names)
void quantize_bf16(program& prog, const std::vector<std::string>& ins_names)
{
run_passes(prog,
{normalize_ops{},
{split_single_dyn_dim{},
dead_code_elimination{},
simplify_dyn_ops{},
dead_code_elimination{},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these passes need to be ran?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split_single_dyn_dim will create the submodules but won't simplify other ops further. So simplify_dyn_ops is used to get rid of things like 2-input multibroadcasts and slices. Awhile back I've found from testing models that this was needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we should be able to quantize without needing to convert dynamic shapes which is a decision that should be made by the backend target.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that should be the case, but currently optimize_module will fail as soon as a matcher tries to interact with a dynamic shape.

normalize_ops{},
optimize_module{{"quantizelinear", "dequantizelinear"}},
truncate_float_pass{ins_names, shape::bf16_type},
optimize_module{{"quantizelinear", "dequantizelinear"}}},
Expand All @@ -93,7 +103,16 @@ static void quantize_8bits(program& prog,
{
// Run optimize_module() before converting to int8/fp8 to const eval and fold in FP32 to
// avoid loss of precision.
run_passes(prog, {rewrite_rnn{}, normalize_ops{}, optimize_module{}}, quant_tracer());
run_passes(prog,
{split_single_dyn_dim{},
dead_code_elimination{},
simplify_dyn_ops{},
dead_code_elimination{},
rewrite_rnn{},
dead_code_elimination{},
normalize_ops{},
optimize_module{}},
quant_tracer());

std::shared_ptr<std::vector<std::pair<float, float>>> quant_8bit_params =
std::make_shared<std::vector<std::pair<float, float>>>();
Expand Down Expand Up @@ -188,7 +207,15 @@ void quantize_int8(program& prog,

void quantize_int4_weights(program& prog)
{
run_passes(prog, {normalize_ops{}, optimize_module{}, quantize_int4_pass{}}, quant_tracer());
run_passes(prog,
{split_single_dyn_dim{},
dead_code_elimination{},
simplify_dyn_ops{},
dead_code_elimination{},
normalize_ops{},
optimize_module{},
quantize_int4_pass{}},
quant_tracer());
}

void quantize_fp8(program& prog, const target& t, const std::vector<parameter_map>& calibration)
Expand Down
10 changes: 10 additions & 0 deletions src/quantize_8bits.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,11 @@

void quantize_8bits_pass::apply(module& m) const // NOLINT
{
// skip main module that contains select_module
if(any_of(m.begin(), m.end(), [](auto ins) { return ins.name() == "select_module"; }))

Check warning on line 55 in src/quantize_8bits.cpp

View workflow job for this annotation

GitHub Actions / tidy

the parameter 'ins' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param,-warnings-as-errors]
{
return;
}
const auto& quantizable_types = get_quantizable_type();
for(auto ins : iterator_for(m))
{
Expand Down Expand Up @@ -97,6 +102,11 @@

void capture_arguments_pass::apply(module& m) const // NOLINT
{
// skip main module that contains select_module
if(any_of(m.begin(), m.end(), [](auto ins) { return ins.name() == "select_module"; }))

Check warning on line 106 in src/quantize_8bits.cpp

View workflow job for this annotation

GitHub Actions / tidy

the parameter 'ins' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param,-warnings-as-errors]
{
return;
}
assert(param_index != nullptr);
const auto& quantizable_types = get_quantizable_type();

Expand Down
5 changes: 5 additions & 0 deletions src/split_single_dyn_dim.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,11 @@
*/
static bool any_sm_next(const_module_ref mm, const std::vector<dynamic_dimensions_check>& ddcs)
{
// skip main module that contains select_module (meaning this pass already ran)
if(any_of(mm->begin(), mm->end(), [](auto ins) { return ins.name() == "select_module"; }))

Check warning on line 99 in src/split_single_dyn_dim.cpp

View workflow job for this annotation

GitHub Actions / tidy

the parameter 'ins' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param,-warnings-as-errors]
{
return true;
}
Comment on lines +98 to +102
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic change here modifies the behavior of any_sm_next to return true when the module contains select_module, effectively treating it the same as if a parameter outputs to select_module. This seems inconsistent with the function's documented purpose which is to check if "any of the parameters outputs to a select_module operator".

Consider either:

  1. Renaming this function to better reflect its new behavior (e.g., should_skip_module), or
  2. Moving this check out of the function and into the caller, to keep the function focused on its original purpose.

Copilot uses AI. Check for mistakes.
for(const auto& ddc : ddcs)
{
auto p_outputs = mm->get_parameter(ddc.dyn_param_str)->outputs();
Expand Down
7 changes: 6 additions & 1 deletion src/truncate_float.cpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/*
* The MIT License (MIT)
*
* Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.
* Copyright (c) 2015-2025 Advanced Micro Devices, Inc. All rights reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -38,6 +38,11 @@
static void
quantize_module(module& m, const std::vector<std::string>& ins_names, shape::type_t float_type)
{
// skip main module that contains select_module
if(any_of(m.begin(), m.end(), [](auto ins) { return ins.name() == "select_module"; }))

Check warning on line 42 in src/truncate_float.cpp

View workflow job for this annotation

GitHub Actions / tidy

the parameter 'ins' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param,-warnings-as-errors]
Comment on lines 38 to +42
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check for select_module is duplicated across multiple files (truncate_float.cpp, quantize_8bits.cpp twice, and split_single_dyn_dim.cpp). Consider extracting this into a helper function to improve maintainability and ensure consistent behavior. For example:

// In a common utility header
inline bool module_has_select_module(const module& m) {
    return any_of(m.begin(), m.end(), [](auto ins) { return ins.name() == "select_module"; });
}

This would reduce code duplication and make future changes easier.

Suggested change
static void
quantize_module(module& m, const std::vector<std::string>& ins_names, shape::type_t float_type)
{
// skip main module that contains select_module
if(any_of(m.begin(), m.end(), [](auto ins) { return ins.name() == "select_module"; }))
// Helper function to check for select_module
static inline bool module_has_select_module(const module& m)
{
return any_of(m.begin(), m.end(), [](auto ins) { return ins.name() == "select_module"; });
}
static void
quantize_module(module& m, const std::vector<std::string>& ins_names, shape::type_t float_type)
{
// skip main module that contains select_module
if(module_has_select_module(m))

Copilot uses AI. Check for mistakes.
{
return;
}
for(auto ins : iterator_for(m))
{
// instructions are not in the set to be quantized
Expand Down
Loading