feat: TurboQuant 4-bit vector quantization for HNSW index by damahua · Pull Request #6794 · chroma-core/chroma

damahua · 2026-04-01T19:57:09Z

Summary

Integrate TurboQuant (Zandieh et al. 2025) 4-bit scalar quantization into hnswlib, reducing per-vector memory from 3072 bytes to 384 bytes (8x compression for 768-dim vectors)
Reduce peak RSS by ~40% for single-node Chroma with 50K vectors (316 MB → 191 MB)
Fully backwards compatible — enabled via CHROMA_QUANTIZATION_BITS=4 env var (default 0 = standard float32)

Dependencies

⚠️ Depends on chroma-core/hnswlib#46 — the hnswlib C++ implementation.
Cargo.toml currently points to the fork (damahua/hnswlib@turboquant) for review.
After hnswlib#46 merges, update to chroma-core/hnswlib@master before merging this PR.

Changes

This PR (2 files):

Cargo.toml: Point hnswlib dep to fork (temporary, see note above)
rust/index/src/hnsw.rs: Read CHROMA_QUANTIZATION_BITS env var, pass to hnswlib init

hnswlib PR (chroma-core/hnswlib#46, 5 files):

turbo_quant.h (new): TurboQuantizer, TurboQuantL2Space, TurboQuantIPSpace
hnswalg.h: quantizer_ field, quantized storage in addPoint/searchKnn/getDataByLabel
hnswlib.h: Include turbo_quant.h
bindings.cpp: create_index_quantized() FFI, set_quantizer() wiring
hnsw.rs: quantization_bits config field

How it works

Random sign rotation (diagonal Rademacher matrix) decorrelates dimensions, making independent scalar quantization near-optimal
4-bit Lloyd-Max codebook (16 centroids for standard normal) encodes each dimension into 4 bits
Symmetric distance: both vectors are dequantized to float32 for exact L2/IP/cosine during graph construction and search

A/B Test Results (N=3, 50K × 768-dim vectors, same binary)

	Run 1	Run 2	Run 3	Mean
Baseline (float32)	317 MB	315 MB	318 MB	316.7 ± 1.5 MB
TurboQuant (4-bit)	207 MB	176 MB	190 MB	191.0 ± 15.6 MB

39.7% peak RSS reduction. Distributions don't overlap. Zero errors across all 351 API requests per run.

Test plan

Verify CHROMA_QUANTIZATION_BITS=0 (default) has no behavior change
Verify CHROMA_QUANTIZATION_BITS=4 reduces memory with 50K+ vectors
Run existing hnswlib test suite with quantization enabled
Measure recall@10 accuracy degradation vs float32 baseline
Test with persistent index (save/load cycle)

🤖 Generated with Claude Code

github-actions · 2026-04-01T19:57:25Z

propel-code-bot · 2026-04-01T19:58:01Z

Add TurboQuant 4-bit quantization, persistence, and expanded FFI bindings for HNSW

This PR introduces TurboQuant 4-bit scalar quantization for HNSW indices, enabling substantial memory reduction while preserving backward compatibility via CHROMA_QUANTIZATION_BITS=4. It integrates quantized storage and symmetric dequantized distance computation into the HNSW core, adds quantization-aware space wrappers, and extends bindings to configure quantization through C++/Rust FFI and environment variables.

In parallel, it expands persistence support with in-memory serialization buffers, multi-stream file persistence, and “dirty” tracking, along with a new Rust crate wrapper and broader C++/Python/Rust examples and tests. Build and packaging metadata are updated, and CI workflows are expanded to cover Python/C++/Rust builds and tests.

This summary was automatically generated by @propel-code-bot

Point hnswlib dependency to damahua/hnswlib@turboquant which adds TurboQuant (Zandieh et al. 2025) 4-bit scalar quantization. This reduces per-vector memory from 3072 to 384 bytes (8x for 768-dim), cutting peak RSS by ~40% for 50K vectors. Enable via CHROMA_QUANTIZATION_BITS=4 env var (default 0 = unchanged). A/B Test (N=3, 50K × 768-dim, same binary): Baseline: 317, 315, 318 MB (mean 316.7) TurboQuant: 207, 176, 190 MB (mean 191.0) Improvement: 39.7% — distributions don't overlap Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

propel-code-bot

Changes suggested for supply-chain safety and configuration validation reliability.

Status: Changes Suggested | Risk: Medium

Issues Identified & Suggestions

Pin git dependency to commit/tag to reduce supply-chain risk: Cargo.toml
Validate quantization env var values; error/warn on invalid: rust/index/src/hnsw.rs

Review Details

📁 2 files reviewed | 💬 2 comments

👍 / 👎 individual comments to help improve reviews for you

propel-code-bot · 2026-04-01T20:07:19Z

 validator = { version = "0.19", features = ["derive"] }
 rust-embed = { version = "8.5.0", features = ["include-exclude", "debug-embed"] }
-hnswlib = { version = "0.8.2", git = "https://github.com/chroma-core/hnswlib.git", branch = "master" }
+hnswlib = { version = "0.8.2", git = "https://github.com/damahua/hnswlib.git", branch = "turboquant" }


[Security] This switches the hnswlib dependency from the organization's own repo (chroma-core/hnswlib.git) to a personal fork (damahua/hnswlib.git) using a moving branch reference without a pinned revision. This worsens the supply-chain risk since a personal fork is under a single individual's control, and a force-push to the turboquant branch would silently change what gets built. Additionally, there is no Cargo.lock in the repository to pin the exact commit.

Fix: pin to a specific commit SHA (rev) or a tag in the fork, e.g.
hnswlib = { git = "https://github.com/damahua/hnswlib.git", rev = "<sha>" } so builds are deterministic. Ideally, also consider merging the changes into the org repo (chroma-core/hnswlib) so ownership stays with the organization.

Context for Agents

This switches the `hnswlib` dependency from the organization's own repo (`chroma-core/hnswlib.git`) to a personal fork (`damahua/hnswlib.git`) using a moving branch reference without a pinned revision. This worsens the supply-chain risk since a personal fork is under a single individual's control, and a force-push to the `turboquant` branch would silently change what gets built. Additionally, there is no `Cargo.lock` in the repository to pin the exact commit. Fix: pin to a specific commit SHA (rev) or a tag in the fork, e.g. `hnswlib = { git = "https://github.com/damahua/hnswlib.git", rev = "<sha>" }` so builds are deterministic. Ideally, also consider merging the changes into the org repo (`chroma-core/hnswlib`) so ownership stays with the organization. File: Cargo.toml Line: 75

propel-code-bot · 2026-04-01T20:07:19Z

+                let quantization_bits: i32 = std::env::var("CHROMA_QUANTIZATION_BITS")
+                    .ok()
+                    .and_then(|v| v.parse().ok())
+                    .unwrap_or(0);


[Reliability] CHROMA_QUANTIZATION_BITS is parsed with parse().ok() and then defaulted to 0. That silently disables quantization on typos or unsupported values (e.g., "four", "-1", "5") and makes misconfiguration hard to detect. Can you validate allowed values and return an error (or at least log/warn) when the env var is set but invalid?

Note: Common HNSW quantization implementations support 8-bit (int8) for 4:1 memory compression with minimal recall impact, as documented in vector search literature. Verify which specific quantization bit depths are supported by the damahua/hnswlib fork being used (likely 0 for disabled, 4-bit, or 8-bit). The value 4 mentioned may be fork-specific.

Example fix:

Parse once into Result<i32, _>

If Ok(bits) and bits not in the supported set (verify: {0, 4} or {0, 4, 8}), return Err(WrappedHnswInitError::Other(...)) or a new InvalidArgument variant with a clear message listing valid values

If Err(_), return a config error instead of silently defaulting

Context for Agents

`CHROMA_QUANTIZATION_BITS` is parsed with `parse().ok()` and then defaulted to 0. That silently disables quantization on typos or unsupported values (e.g., "four", "-1", "5") and makes misconfiguration hard to detect. Can you validate allowed values and return an error (or at least log/warn) when the env var is set but invalid? Note: Common HNSW quantization implementations support 8-bit (int8) for 4:1 memory compression with minimal recall impact, as documented in vector search literature. Verify which specific quantization bit depths are supported by the damahua/hnswlib fork being used (likely 0 for disabled, 4-bit, or 8-bit). The value 4 mentioned may be fork-specific. Example fix: - Parse once into `Result<i32, _>` - If `Ok(bits)` and bits not in the supported set (verify: {0, 4} or {0, 4, 8}), return `Err(WrappedHnswInitError::Other(...))` or a new `InvalidArgument` variant with a clear message listing valid values - If `Err(_)`, return a config error instead of silently defaulting File: rust/index/src/hnsw.rs Line: 159

damahua marked this pull request as draft April 1, 2026 19:58

damahua force-pushed the autoopt/turboquant branch from a4cd728 to b3885e5 Compare April 1, 2026 20:03

propel-code-bot Bot reviewed Apr 1, 2026

View reviewed changes

damahua mentioned this pull request Apr 1, 2026

feat: TurboQuant 4-bit vector quantization chroma-core/hnswlib#46

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: TurboQuant 4-bit vector quantization for HNSW index#6794

feat: TurboQuant 4-bit vector quantization for HNSW index#6794
damahua wants to merge 1 commit into
chroma-core:mainfrom
damahua:autoopt/turboquant

damahua commented Apr 1, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 1, 2026

Uh oh!

propel-code-bot Bot commented Apr 1, 2026

Uh oh!

propel-code-bot Bot left a comment

Uh oh!

propel-code-bot Bot Apr 1, 2026

Uh oh!

propel-code-bot Bot Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

damahua commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Dependencies

Changes

How it works

A/B Test Results (N=3, 50K × 768-dim vectors, same binary)

Test plan

Uh oh!

github-actions Bot commented Apr 1, 2026

Reviewer Checklist

Testing, Bugs, Errors, Logs, Documentation

System Compatibility

Quality

Uh oh!

propel-code-bot Bot commented Apr 1, 2026

Uh oh!

propel-code-bot Bot left a comment

Choose a reason for hiding this comment

Uh oh!

propel-code-bot Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

propel-code-bot Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

damahua commented Apr 1, 2026 •

edited

Loading