[GenAI] Use BitsAndBytes for 4bit quantization. #7406

LittleLittleCloud · 2025-03-02T01:06:07Z

We are excited to review your PR.

So we can do the best job, please check:

There's a descriptive title that will make sense to other developers some time from now.
There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
You have included any necessary tests in the same PR.

This PR uses the 4bit quantization method from bitsandbytes library to quantize linear layer into 4 bits.

What's bitsandbytes

bitsandbytes is a library used by huggingface transformer to provide support for 4bit and 8bit quantization and operation.

The bitsandbytes is written in cuda, and we provide a C# binding library LittleLittleCloud.TorchSharp.BitsAndBytes to enable easy leverage with torchsharp library.

Copilot

PR Overview

This PR introduces support for 4‑bit quantization using the BitsAndBytes library by renaming and replacing the old Int4 API with a new Quantize4Bit approach. Key changes include updating the IQuantizeModule interface and its configuration record, propagating these changes across module extension methods and model loading routines, and adjusting documentation and sample code accordingly.

Reviewed Changes

File	Description
src/Microsoft.ML.GenAI.Core/Module/IQuantizeModule.cs	Added Quantize4Bit method and Quantize4BitConfig record with updated XML comments.
src/Microsoft.ML.GenAI.Core/Extension/ModuleExtension.cs	Replaced ToInt4QuantizeModule calls with the new ToQuantize4BitModule API.
src/Microsoft.ML.GenAI.[Phi	LLaMA
Test Files	Removed tests for the deprecated Int4 quantize functionality.
Docs/Samples	Updated sample code to reflect the new API usage.

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

test/Microsoft.ML.GenAI.Phi.Tests/Phi3Tests.cs:51

Consider adding new tests for the 4-bit quantization functionality to ensure the new Quantize4BitModule behaves as expected, since tests for the old Int4 quantization were removed.

[-    [Fact] ... removal of Phi3Mini4KInt4QuantizeShapeTest]

src/Microsoft.ML.GenAI.Core/Module/IQuantizeModule.cs

src/Microsoft.ML.GenAI.Core/Extension/ModuleExtension.cs

Co-authored-by: Copilot <[email protected]>

LittleLittleCloud · 2025-03-03T18:44:39Z

/azp run

azure-pipelines · 2025-03-03T18:44:52Z

Azure Pipelines successfully started running 2 pipeline(s).

docs/samples/Microsoft.ML.GenAI.Samples/Llama/LlamaSample.cs

LittleLittleCloud · 2025-03-04T06:25:48Z

/azp run

azure-pipelines · 2025-03-04T06:26:00Z

Azure Pipelines successfully started running 2 pipeline(s).

LittleLittleCloud · 2025-03-04T21:49:34Z

/azp run

azure-pipelines · 2025-03-04T21:49:46Z

Azure Pipelines successfully started running 2 pipeline(s).

michaelgsharp · 2025-03-07T18:41:14Z

docs/samples/Microsoft.ML.GenAI.Samples/Microsoft.ML.GenAI.Samples.csproj

    <PackageReference Include="Microsoft.SemanticKernel" Version="$(SemanticKernelVersion)" />
    <PackageReference Include="AutoGen.SourceGenerator" Version="$(AutoGenVersion)" />
    <PackageReference Include="Microsoft.Extensions.Logging.Console" Version="8.0.0" />
+    <PackageReference Include="LittleLittleCloud.TorchSharp.BitsAndBytes" Version="0.0.4" />


Is this not something we can directly get into torchsharp itself?

Probably not, bitsandbytes is not part of libtorch...

Is the code in a repo owned by you? Or by Microsoft?

And we are going to want the version in a central location since you use it in more than once place.

Is the code in a repo owned by you? Or by Microsoft?

The wrapping code is in a repo owned by myself. The cuda code is owned by huggingface I believe. Both are under MIT license.

michaelgsharp · 2025-03-07T18:42:27Z

src/Microsoft.ML.GenAI.Core/Extension/ModuleExtension.cs

@@ -90,13 +90,18 @@ public static void ToInt8QuantizeModule<T>(
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="model"></param>
-    public static void ToInt4QuantizeModule<T>(
-        this T model)
+    /// <param name="quantizedDType">Quantized data type, can be "fp4" or "nf4".</param>


Maybe either here or in the summary add a note as to the difference between "fp4" and "nf4".

…learning into u/0228

LittleLittleCloud added 3 commits March 1, 2025 16:37

use bitsandbytes for 4bit quantize

a67fc3a

update

389e118

update default configuration

f4f15ea

Copilot AI review requested due to automatic review settings March 2, 2025 01:06

dotnet-policy-service bot assigned LittleLittleCloud Mar 2, 2025

Copilot AI reviewed Mar 2, 2025

View reviewed changes

src/Microsoft.ML.GenAI.Core/Module/IQuantizeModule.cs Outdated Show resolved Hide resolved

src/Microsoft.ML.GenAI.Core/Extension/ModuleExtension.cs Show resolved Hide resolved

LittleLittleCloud and others added 2 commits March 1, 2025 17:07

Update src/Microsoft.ML.GenAI.Core/Module/IQuantizeModule.cs

e52cfd6

Co-authored-by: Copilot <[email protected]>

Update default block size in Quantize4BitConfig

cd7a6c5

LittleLittleCloud commented Mar 3, 2025

View reviewed changes

docs/samples/Microsoft.ML.GenAI.Samples/Llama/LlamaSample.cs Outdated Show resolved Hide resolved

michaelgsharp reviewed Mar 7, 2025

View reviewed changes

LittleLittleCloud added 3 commits March 7, 2025 12:48

kernelSize -> kernel_size

080cf48

fix comments

849a6c8

Merge branch 'u/0228' of https://github.com/LittleLittleCloud/machine…

62ffa45

…learning into u/0228

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GenAI] Use BitsAndBytes for 4bit quantization. #7406

[GenAI] Use BitsAndBytes for 4bit quantization. #7406

Uh oh!

LittleLittleCloud commented Mar 2, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

LittleLittleCloud commented Mar 3, 2025

Uh oh!

azure-pipelines bot commented Mar 3, 2025

Uh oh!

Uh oh!

LittleLittleCloud commented Mar 4, 2025

Uh oh!

azure-pipelines bot commented Mar 4, 2025

Uh oh!

LittleLittleCloud commented Mar 4, 2025

Uh oh!

azure-pipelines bot commented Mar 4, 2025

Uh oh!

michaelgsharp Mar 7, 2025

Uh oh!

LittleLittleCloud Mar 7, 2025

Uh oh!

michaelgsharp Mar 10, 2025

Uh oh!

michaelgsharp Mar 10, 2025

Uh oh!

LittleLittleCloud Mar 10, 2025

Uh oh!

michaelgsharp Mar 7, 2025

Uh oh!

Uh oh!

[GenAI] Use BitsAndBytes for 4bit quantization. #7406

Are you sure you want to change the base?

[GenAI] Use BitsAndBytes for 4bit quantization. #7406

Uh oh!

Conversation

LittleLittleCloud commented Mar 2, 2025

What's bitsandbytes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

PR Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

LittleLittleCloud commented Mar 3, 2025

Uh oh!

azure-pipelines bot commented Mar 3, 2025

Uh oh!

Uh oh!

LittleLittleCloud commented Mar 4, 2025

Uh oh!

azure-pipelines bot commented Mar 4, 2025

Uh oh!

LittleLittleCloud commented Mar 4, 2025

Uh oh!

azure-pipelines bot commented Mar 4, 2025

Uh oh!

michaelgsharp Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

LittleLittleCloud Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

michaelgsharp Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

michaelgsharp Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

LittleLittleCloud Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

michaelgsharp Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!