KLAWQ: KL-Aware Weight Quantization for Edge AI

A novel post-training quantization framework that enhances GPTQ by integrating KL divergence for better accuracy preservation when deploying Large Language Models on edge devices. 1

Core Innovation

KLAWQ extends GPTQ by adding a KL divergence term to align quantized model outputs with the original model's distribution: 2

L(Q) = L_MSE(Q) + β * L_KL(Q)

The algorithm modifies the Hessian computation as H_tot = H + βA, where A is the KL Hessian matrix. 3

Key Components

Configuration: Hyperparameters (β, τ) in KLAWQ/gptqmodel/quantization/config.py 4
Core Algorithm: KL Hessian computation in KLAWQ/kl-aware-quant/quantization/gptq.py 5
Quantization Engine: Low-level operations in KLAWQ/kl-aware-quant/quantization/quantizer.py 6
Analysis Notebooks: Experimental validation in kl-hessian-gptq-*.ipynb files 7

Quick Start

Clone and Setup: 8

git clone https://github.com/ha405/Compression-Framework-for-EdgeAI
cd Compression-Framework-for-EdgeAI

Install Dependencies: Install PyTorch, transformers, and other requirements from requirements.txt
Run Quantization: Use the Jupyter notebooks for experimentation or integrate the KLAWQ modules directly

Results

Experiments on GPT-2 at 8-bit precision demonstrate improved perplexity scores compared to vanilla GPTQ while maintaining post-training quantization efficiency. 9

Notes

The framework is built on a comprehensive infrastructure stack including PyTorch >=2.4.1, transformers >=4.51.2, and FastAPI for model serving. The project structure shows a modular design with separate components for adapter functionality, model definitions, and processing loops, though the core KLAWQ innovation is concentrated in the quantization modules. 10

Wiki pages you might want to explore:

Overview (ha405/Compression-Framework-for-EdgeAI)

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
KLAWQ		KLAWQ
Literature-Review		Literature-Review
Testing		Testing
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KLAWQ: KL-Aware Weight Quantization for Edge AI

Core Innovation

Key Components

Quick Start

Results

Notes

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

ha405/Compression-Framework-for-EdgeAI

Folders and files

Latest commit

History

Repository files navigation

KLAWQ: KL-Aware Weight Quantization for Edge AI

Core Innovation

Key Components

Quick Start

Results

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages