Skip to content

feat: TurboQuant 4-bit vector quantization#46

Draft
damahua wants to merge 1 commit into
chroma-core:masterfrom
damahua:turboquant
Draft

feat: TurboQuant 4-bit vector quantization#46
damahua wants to merge 1 commit into
chroma-core:masterfrom
damahua:turboquant

Conversation

@damahua
Copy link
Copy Markdown

@damahua damahua commented Apr 1, 2026

Summary

Changes (5 files)

  • hnswlib/turbo_quant.h (new): TurboQuantizer class — random sign rotation + 4-bit Lloyd-Max codebook + symmetric distance; TurboQuantL2Space and TurboQuantIPSpace wrappers compatible with SpaceInterface
  • hnswlib/hnswalg.h: quantizer_ field, set_quantizer(), quantized storage in addPoint, quantized query in searchKnn, dequantize in getDataByLabel
  • hnswlib/hnswlib.h: Include turbo_quant.h
  • src/bindings.cpp: create_index_quantized() FFI entry point, set_quantizer() wiring after index init
  • src/hnsw.rs: quantization_bits field in HnswIndexInitConfig, conditional create_index_quantized call

How it works

  1. Random sign rotation (diagonal Rademacher matrix) decorrelates dimensions, making independent scalar quantization near-optimal
  2. 4-bit Lloyd-Max codebook (16 centroids for standard normal) encodes each dimension
  3. Symmetric distance: both vectors dequantized to float32 for exact L2/IP/cosine — used in graph construction (addPoint) and search (searchKnn)
  4. getDataByLabel: dequantizes stored codes back to float32 for Chroma's brute-force fallback path

A/B Test Results (N=3, 50K × 768-dim, Chroma single-node)

Run 1 Run 2 Run 3 Mean
Baseline (float32) 317 MB 315 MB 318 MB 316.7 ± 1.5 MB
TurboQuant (4-bit) 207 MB 176 MB 190 MB 191.0 ± 15.6 MB

39.7% peak RSS reduction. Zero errors across all API requests.

Test plan

  • Existing hnswlib test suite passes with quantization_bits=0
  • Existing hnswlib test suite passes with quantization_bits=4
  • Measure recall@10 accuracy vs float32 baseline
  • Test persistent index save/load with quantized data

🤖 Generated with Claude Code

Integrate TurboQuant (Zandieh et al. 2025) 4-bit scalar quantization
to reduce per-vector memory from 3072 to 384 bytes (8x for 768-dim).

How it works:
- Random sign rotation decorrelates dimensions
- 4-bit Lloyd-Max codebook (16 centroids) encodes each dimension
- Symmetric distance: dequantize both vectors for exact L2/IP/cosine

Files:
- turbo_quant.h: TurboQuantizer, TurboQuantL2Space, TurboQuantIPSpace
- hnswalg.h: quantizer_ field, quantized addPoint/searchKnn/getDataByLabel
- bindings.cpp: create_index_quantized() FFI, set_quantizer() wiring
- hnsw.rs: quantization_bits config field

Enabled via quantization_bits=4 parameter. Default (0) = unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant