feat: TurboQuant 4-bit vector quantization by damahua · Pull Request #46 · chroma-core/hnswlib

damahua · 2026-04-01T20:21:02Z

Summary

Add TurboQuant (Zandieh et al. 2025) 4-bit scalar quantization to reduce per-vector memory from 3072 to 384 bytes (8x compression for 768-dim vectors)
Enabled via quantization_bits=4 parameter in HnswIndexInitConfig (default 0 = unchanged)
Companion to feat: TurboQuant 4-bit vector quantization for HNSW index chroma#6794

Changes (5 files)

hnswlib/turbo_quant.h (new): TurboQuantizer class — random sign rotation + 4-bit Lloyd-Max codebook + symmetric distance; TurboQuantL2Space and TurboQuantIPSpace wrappers compatible with SpaceInterface
hnswlib/hnswalg.h: quantizer_ field, set_quantizer(), quantized storage in addPoint, quantized query in searchKnn, dequantize in getDataByLabel
hnswlib/hnswlib.h: Include turbo_quant.h
src/bindings.cpp: create_index_quantized() FFI entry point, set_quantizer() wiring after index init
src/hnsw.rs: quantization_bits field in HnswIndexInitConfig, conditional create_index_quantized call

How it works

Random sign rotation (diagonal Rademacher matrix) decorrelates dimensions, making independent scalar quantization near-optimal
4-bit Lloyd-Max codebook (16 centroids for standard normal) encodes each dimension
Symmetric distance: both vectors dequantized to float32 for exact L2/IP/cosine — used in graph construction (addPoint) and search (searchKnn)
getDataByLabel: dequantizes stored codes back to float32 for Chroma's brute-force fallback path

A/B Test Results (N=3, 50K × 768-dim, Chroma single-node)

	Run 1	Run 2	Run 3	Mean
Baseline (float32)	317 MB	315 MB	318 MB	316.7 ± 1.5 MB
TurboQuant (4-bit)	207 MB	176 MB	190 MB	191.0 ± 15.6 MB

39.7% peak RSS reduction. Zero errors across all API requests.

Test plan

Existing hnswlib test suite passes with quantization_bits=0
Existing hnswlib test suite passes with quantization_bits=4
Measure recall@10 accuracy vs float32 baseline
Test persistent index save/load with quantized data

🤖 Generated with Claude Code

Integrate TurboQuant (Zandieh et al. 2025) 4-bit scalar quantization to reduce per-vector memory from 3072 to 384 bytes (8x for 768-dim). How it works: - Random sign rotation decorrelates dimensions - 4-bit Lloyd-Max codebook (16 centroids) encodes each dimension - Symmetric distance: dequantize both vectors for exact L2/IP/cosine Files: - turbo_quant.h: TurboQuantizer, TurboQuantL2Space, TurboQuantIPSpace - hnswalg.h: quantizer_ field, quantized addPoint/searchKnn/getDataByLabel - bindings.cpp: create_index_quantized() FFI, set_quantizer() wiring - hnsw.rs: quantization_bits config field Enabled via quantization_bits=4 parameter. Default (0) = unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

damahua marked this pull request as draft April 1, 2026 20:26

damahua mentioned this pull request Apr 1, 2026

feat: TurboQuant 4-bit vector quantization for HNSW index chroma-core/chroma#6794

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: TurboQuant 4-bit vector quantization#46

feat: TurboQuant 4-bit vector quantization#46
damahua wants to merge 1 commit into
chroma-core:masterfrom
damahua:turboquant

damahua commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

damahua commented Apr 1, 2026

Summary

Changes (5 files)

How it works

A/B Test Results (N=3, 50K × 768-dim, Chroma single-node)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant