Skip to content

ha405/Compression-Framework-for-EdgeAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KLAWQ: KL-Aware Weight Quantization for Edge AI

A novel post-training quantization framework that enhances GPTQ by integrating KL divergence for better accuracy preservation when deploying Large Language Models on edge devices. 1

Core Innovation

KLAWQ extends GPTQ by adding a KL divergence term to align quantized model outputs with the original model's distribution: 2

L(Q) = L_MSE(Q) + β * L_KL(Q)

The algorithm modifies the Hessian computation as H_tot = H + βA, where A is the KL Hessian matrix. 3

Key Components

  • Configuration: Hyperparameters (β, τ) in KLAWQ/gptqmodel/quantization/config.py 4
  • Core Algorithm: KL Hessian computation in KLAWQ/kl-aware-quant/quantization/gptq.py 5
  • Quantization Engine: Low-level operations in KLAWQ/kl-aware-quant/quantization/quantizer.py 6
  • Analysis Notebooks: Experimental validation in kl-hessian-gptq-*.ipynb files 7

Quick Start

  1. Clone and Setup: 8

    git clone https://github.com/ha405/Compression-Framework-for-EdgeAI
    cd Compression-Framework-for-EdgeAI
  2. Install Dependencies: Install PyTorch, transformers, and other requirements from requirements.txt

  3. Run Quantization: Use the Jupyter notebooks for experimentation or integrate the KLAWQ modules directly

Results

Experiments on GPT-2 at 8-bit precision demonstrate improved perplexity scores compared to vanilla GPTQ while maintaining post-training quantization efficiency. 9

Notes

The framework is built on a comprehensive infrastructure stack including PyTorch >=2.4.1, transformers >=4.51.2, and FastAPI for model serving. The project structure shows a modular design with separate components for adapter functionality, model definitions, and processing loops, though the core KLAWQ innovation is concentrated in the quantization modules. 10

Wiki pages you might want to explore:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •