Update doc and code to run quantized model (#157)

kimishpatel · facebook-github-bot · commit 1f88ff4e4841 · 2023-08-28T18:19:37.000-07:00
Summary: Pull Request resolved: #157 - Fix doc to separte 1) generating quantized model and b) running it with executor_runner - Include <tuple> in chose_qparams - Include quantized ops by default in executor_runner Reviewed By: larryliu0820, guangy10 Differential Revision: D48752106 fbshipit-source-id: 30f4e7ba121abeb01b7b97020c2fef0f5d2ac891
diff --git a/examples/README.md b/examples/README.md
@@ -58,7 +58,9 @@ buck2 run examples/executor_runner:executor_runner -- --model_path mv2.pte
 ## Quantization
 Here is the [Quantization Flow Docs](/docs/website/docs/tutorials/quantization_flow.md).
 
-You can run quantization test with the following command:
+### Generating quantized model
+
+You can generate quantized model with the following command (following example is for mv2, aka MobileNetV2):
 ```bash
 python3 -m examples.quantization.example --model_name "mv2" --so-library "<path/to/so/lib>" # for MobileNetv2
 ```
@@ -80,6 +82,16 @@ you can also find the valid quantized example models by running:
 buck2 run executorch/examples/quantization:example -- --help
 ```
 
+### Running quantized model
+
+Quantized model can be run via executor_runner, similar to floating point model, via, as shown above:
+
+```bash
+buck2 run examples/executor_runner:executor_runner -- --model_path mv2.pte
+```
+
+Note that, running quantized model, requires various quantized/dequantize operators, available in [quantized kernel lib](/kernels/quantized).
+
 ## XNNPACK Backend
 Please see [Backend README](backend/README) for XNNPACK quantization, export, and run workflow.
 
diff --git a/examples/executor_runner/targets.bzl b/examples/executor_runner/targets.bzl
@@ -28,13 +28,13 @@ def define_common_targets():
 
     register_custom_op = native.read_config("executorch", "register_custom_op", "0")
     register_quantized_ops = native.read_config("executorch", "register_quantized_ops", "0")
-    custom_ops_lib = []
+
+    # Include quantized ops to be able to run quantized model with portable ops
+    custom_ops_lib = ["//executorch/kernels/quantized:generated_lib"]
     if register_custom_op == "1":
         custom_ops_lib.append("//executorch/examples/custom_ops:lib_1")
     elif register_custom_op == "2":
         custom_ops_lib.append("//executorch/examples/custom_ops:lib_2")
-    if register_quantized_ops == "1":
-        custom_ops_lib.append("//executorch/kernels/quantized:generated_lib")
 
     # Test driver for models, uses all portable kernels and a demo backend. This
     # is intended to have minimal dependencies. If you want a runner that links
diff --git a/examples/quantization/test_quantize.sh b/examples/quantization/test_quantize.sh
@@ -32,8 +32,7 @@ test_buck2_quantization() {
   ${PYTHON_EXECUTABLE} -m "examples.quantization.example" --so_library="$SO_LIB" --model_name="$1"
 
   echo 'Running executor_runner'
-  buck2 run //examples/executor_runner:executor_runner \
-    --config=executorch.register_quantized_ops=1 -- --model_path="./$1.pte"
+  buck2 run //examples/executor_runner:executor_runner -- --model_path="./$1.pte"
   # should give correct result
 
   echo "Removing $1.pte"
diff --git a/kernels/quantized/cpu/op_choose_qparams.cpp b/kernels/quantized/cpu/op_choose_qparams.cpp
@@ -11,6 +11,7 @@
 #include <algorithm>
 #include <cinttypes>
 #include <cmath>
+#include <tuple>
 /**
  * For an input tensor, use the scale and zero_point arguments to quantize it.
  */