Skip to content

feat: TurboQuant 4-bit vector quantization for HNSW index#6794

Draft
damahua wants to merge 1 commit into
chroma-core:mainfrom
damahua:autoopt/turboquant
Draft

feat: TurboQuant 4-bit vector quantization for HNSW index#6794
damahua wants to merge 1 commit into
chroma-core:mainfrom
damahua:autoopt/turboquant

Conversation

@damahua
Copy link
Copy Markdown

@damahua damahua commented Apr 1, 2026

Summary

  • Integrate TurboQuant (Zandieh et al. 2025) 4-bit scalar quantization into hnswlib, reducing per-vector memory from 3072 bytes to 384 bytes (8x compression for 768-dim vectors)
  • Reduce peak RSS by ~40% for single-node Chroma with 50K vectors (316 MB → 191 MB)
  • Fully backwards compatible — enabled via CHROMA_QUANTIZATION_BITS=4 env var (default 0 = standard float32)

Dependencies

⚠️ Depends on chroma-core/hnswlib#46 — the hnswlib C++ implementation.
Cargo.toml currently points to the fork (damahua/hnswlib@turboquant) for review.
After hnswlib#46 merges, update to chroma-core/hnswlib@master before merging this PR.

Changes

This PR (2 files):

  • Cargo.toml: Point hnswlib dep to fork (temporary, see note above)
  • rust/index/src/hnsw.rs: Read CHROMA_QUANTIZATION_BITS env var, pass to hnswlib init

hnswlib PR (chroma-core/hnswlib#46, 5 files):

  • turbo_quant.h (new): TurboQuantizer, TurboQuantL2Space, TurboQuantIPSpace
  • hnswalg.h: quantizer_ field, quantized storage in addPoint/searchKnn/getDataByLabel
  • hnswlib.h: Include turbo_quant.h
  • bindings.cpp: create_index_quantized() FFI, set_quantizer() wiring
  • hnsw.rs: quantization_bits config field

How it works

  • Random sign rotation (diagonal Rademacher matrix) decorrelates dimensions, making independent scalar quantization near-optimal
  • 4-bit Lloyd-Max codebook (16 centroids for standard normal) encodes each dimension into 4 bits
  • Symmetric distance: both vectors are dequantized to float32 for exact L2/IP/cosine during graph construction and search

A/B Test Results (N=3, 50K × 768-dim vectors, same binary)

Run 1 Run 2 Run 3 Mean
Baseline (float32) 317 MB 315 MB 318 MB 316.7 ± 1.5 MB
TurboQuant (4-bit) 207 MB 176 MB 190 MB 191.0 ± 15.6 MB

39.7% peak RSS reduction. Distributions don't overlap. Zero errors across all 351 API requests per run.

Test plan

  • Verify CHROMA_QUANTIZATION_BITS=0 (default) has no behavior change
  • Verify CHROMA_QUANTIZATION_BITS=4 reduces memory with 50K+ vectors
  • Run existing hnswlib test suite with quantization enabled
  • Measure recall@10 accuracy degradation vs float32 baseline
  • Test with persistent index (save/load cycle)

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 1, 2026

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@propel-code-bot
Copy link
Copy Markdown
Contributor

Add TurboQuant 4-bit quantization, persistence, and expanded FFI bindings for HNSW

This PR introduces TurboQuant 4-bit scalar quantization for HNSW indices, enabling substantial memory reduction while preserving backward compatibility via CHROMA_QUANTIZATION_BITS=4. It integrates quantized storage and symmetric dequantized distance computation into the HNSW core, adds quantization-aware space wrappers, and extends bindings to configure quantization through C++/Rust FFI and environment variables.

In parallel, it expands persistence support with in-memory serialization buffers, multi-stream file persistence, and “dirty” tracking, along with a new Rust crate wrapper and broader C++/Python/Rust examples and tests. Build and packaging metadata are updated, and CI workflows are expanded to cover Python/C++/Rust builds and tests.

This summary was automatically generated by @propel-code-bot

@damahua damahua marked this pull request as draft April 1, 2026 19:58
Point hnswlib dependency to damahua/hnswlib@turboquant which adds
TurboQuant (Zandieh et al. 2025) 4-bit scalar quantization. This
reduces per-vector memory from 3072 to 384 bytes (8x for 768-dim),
cutting peak RSS by ~40% for 50K vectors.

Enable via CHROMA_QUANTIZATION_BITS=4 env var (default 0 = unchanged).

A/B Test (N=3, 50K × 768-dim, same binary):
  Baseline: 317, 315, 318 MB (mean 316.7)
  TurboQuant: 207, 176, 190 MB (mean 191.0)
  Improvement: 39.7% — distributions don't overlap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@damahua damahua force-pushed the autoopt/turboquant branch from a4cd728 to b3885e5 Compare April 1, 2026 20:03
Copy link
Copy Markdown
Contributor

@propel-code-bot propel-code-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes suggested for supply-chain safety and configuration validation reliability.

Status: Changes Suggested | Risk: Medium

Issues Identified & Suggestions
  • Pin git dependency to commit/tag to reduce supply-chain risk: Cargo.toml
  • Validate quantization env var values; error/warn on invalid: rust/index/src/hnsw.rs
Review Details

📁 2 files reviewed | 💬 2 comments

👍 / 👎 individual comments to help improve reviews for you

Comment thread Cargo.toml
validator = { version = "0.19", features = ["derive"] }
rust-embed = { version = "8.5.0", features = ["include-exclude", "debug-embed"] }
hnswlib = { version = "0.8.2", git = "https://github.com/chroma-core/hnswlib.git", branch = "master" }
hnswlib = { version = "0.8.2", git = "https://github.com/damahua/hnswlib.git", branch = "turboquant" }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Security] This switches the hnswlib dependency from the organization's own repo (chroma-core/hnswlib.git) to a personal fork (damahua/hnswlib.git) using a moving branch reference without a pinned revision. This worsens the supply-chain risk since a personal fork is under a single individual's control, and a force-push to the turboquant branch would silently change what gets built. Additionally, there is no Cargo.lock in the repository to pin the exact commit.

Fix: pin to a specific commit SHA (rev) or a tag in the fork, e.g.
hnswlib = { git = "https://github.com/damahua/hnswlib.git", rev = "<sha>" } so builds are deterministic. Ideally, also consider merging the changes into the org repo (chroma-core/hnswlib) so ownership stays with the organization.

Context for Agents
This switches the `hnswlib` dependency from the organization's own repo (`chroma-core/hnswlib.git`) to a personal fork (`damahua/hnswlib.git`) using a moving branch reference without a pinned revision. This worsens the supply-chain risk since a personal fork is under a single individual's control, and a force-push to the `turboquant` branch would silently change what gets built. Additionally, there is no `Cargo.lock` in the repository to pin the exact commit.

Fix: pin to a specific commit SHA (rev) or a tag in the fork, e.g.
`hnswlib = { git = "https://github.com/damahua/hnswlib.git", rev = "<sha>" }` so builds are deterministic. Ideally, also consider merging the changes into the org repo (`chroma-core/hnswlib`) so ownership stays with the organization.

File: Cargo.toml
Line: 75

Comment thread rust/index/src/hnsw.rs
let quantization_bits: i32 = std::env::var("CHROMA_QUANTIZATION_BITS")
.ok()
.and_then(|v| v.parse().ok())
.unwrap_or(0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Reliability] CHROMA_QUANTIZATION_BITS is parsed with parse().ok() and then defaulted to 0. That silently disables quantization on typos or unsupported values (e.g., "four", "-1", "5") and makes misconfiguration hard to detect. Can you validate allowed values and return an error (or at least log/warn) when the env var is set but invalid?

Note: Common HNSW quantization implementations support 8-bit (int8) for 4:1 memory compression with minimal recall impact, as documented in vector search literature. Verify which specific quantization bit depths are supported by the damahua/hnswlib fork being used (likely 0 for disabled, 4-bit, or 8-bit). The value 4 mentioned may be fork-specific.

Example fix:

  • Parse once into Result<i32, _>
  • If Ok(bits) and bits not in the supported set (verify: {0, 4} or {0, 4, 8}), return Err(WrappedHnswInitError::Other(...)) or a new InvalidArgument variant with a clear message listing valid values
  • If Err(_), return a config error instead of silently defaulting
Context for Agents
`CHROMA_QUANTIZATION_BITS` is parsed with `parse().ok()` and then defaulted to 0. That silently disables quantization on typos or unsupported values (e.g., "four", "-1", "5") and makes misconfiguration hard to detect. Can you validate allowed values and return an error (or at least log/warn) when the env var is set but invalid?

Note: Common HNSW quantization implementations support 8-bit (int8) for 4:1 memory compression with minimal recall impact, as documented in vector search literature. Verify which specific quantization bit depths are supported by the damahua/hnswlib fork being used (likely 0 for disabled, 4-bit, or 8-bit). The value 4 mentioned may be fork-specific.

Example fix:
- Parse once into `Result<i32, _>`
- If `Ok(bits)` and bits not in the supported set (verify: {0, 4} or {0, 4, 8}), return `Err(WrappedHnswInitError::Other(...))` or a new `InvalidArgument` variant with a clear message listing valid values
- If `Err(_)`, return a config error instead of silently defaulting

File: rust/index/src/hnsw.rs
Line: 159

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant