feat: TurboQuant 4-bit vector quantization for HNSW index#6794
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
Add TurboQuant 4-bit quantization, persistence, and expanded FFI bindings for HNSW This PR introduces TurboQuant 4-bit scalar quantization for HNSW indices, enabling substantial memory reduction while preserving backward compatibility via In parallel, it expands persistence support with in-memory serialization buffers, multi-stream file persistence, and “dirty” tracking, along with a new Rust crate wrapper and broader C++/Python/Rust examples and tests. Build and packaging metadata are updated, and CI workflows are expanded to cover Python/C++/Rust builds and tests. This summary was automatically generated by @propel-code-bot |
Point hnswlib dependency to damahua/hnswlib@turboquant which adds TurboQuant (Zandieh et al. 2025) 4-bit scalar quantization. This reduces per-vector memory from 3072 to 384 bytes (8x for 768-dim), cutting peak RSS by ~40% for 50K vectors. Enable via CHROMA_QUANTIZATION_BITS=4 env var (default 0 = unchanged). A/B Test (N=3, 50K × 768-dim, same binary): Baseline: 317, 315, 318 MB (mean 316.7) TurboQuant: 207, 176, 190 MB (mean 191.0) Improvement: 39.7% — distributions don't overlap Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
a4cd728 to
b3885e5
Compare
There was a problem hiding this comment.
Changes suggested for supply-chain safety and configuration validation reliability.
Status: Changes Suggested | Risk: Medium
Issues Identified & Suggestions
- Pin git dependency to commit/tag to reduce supply-chain risk:
Cargo.toml - Validate quantization env var values; error/warn on invalid:
rust/index/src/hnsw.rs
Review Details
📁 2 files reviewed | 💬 2 comments
👍 / 👎 individual comments to help improve reviews for you
| validator = { version = "0.19", features = ["derive"] } | ||
| rust-embed = { version = "8.5.0", features = ["include-exclude", "debug-embed"] } | ||
| hnswlib = { version = "0.8.2", git = "https://github.com/chroma-core/hnswlib.git", branch = "master" } | ||
| hnswlib = { version = "0.8.2", git = "https://github.com/damahua/hnswlib.git", branch = "turboquant" } |
There was a problem hiding this comment.
[Security] This switches the hnswlib dependency from the organization's own repo (chroma-core/hnswlib.git) to a personal fork (damahua/hnswlib.git) using a moving branch reference without a pinned revision. This worsens the supply-chain risk since a personal fork is under a single individual's control, and a force-push to the turboquant branch would silently change what gets built. Additionally, there is no Cargo.lock in the repository to pin the exact commit.
Fix: pin to a specific commit SHA (rev) or a tag in the fork, e.g.
hnswlib = { git = "https://github.com/damahua/hnswlib.git", rev = "<sha>" } so builds are deterministic. Ideally, also consider merging the changes into the org repo (chroma-core/hnswlib) so ownership stays with the organization.
Context for Agents
This switches the `hnswlib` dependency from the organization's own repo (`chroma-core/hnswlib.git`) to a personal fork (`damahua/hnswlib.git`) using a moving branch reference without a pinned revision. This worsens the supply-chain risk since a personal fork is under a single individual's control, and a force-push to the `turboquant` branch would silently change what gets built. Additionally, there is no `Cargo.lock` in the repository to pin the exact commit.
Fix: pin to a specific commit SHA (rev) or a tag in the fork, e.g.
`hnswlib = { git = "https://github.com/damahua/hnswlib.git", rev = "<sha>" }` so builds are deterministic. Ideally, also consider merging the changes into the org repo (`chroma-core/hnswlib`) so ownership stays with the organization.
File: Cargo.toml
Line: 75| let quantization_bits: i32 = std::env::var("CHROMA_QUANTIZATION_BITS") | ||
| .ok() | ||
| .and_then(|v| v.parse().ok()) | ||
| .unwrap_or(0); |
There was a problem hiding this comment.
[Reliability] CHROMA_QUANTIZATION_BITS is parsed with parse().ok() and then defaulted to 0. That silently disables quantization on typos or unsupported values (e.g., "four", "-1", "5") and makes misconfiguration hard to detect. Can you validate allowed values and return an error (or at least log/warn) when the env var is set but invalid?
Note: Common HNSW quantization implementations support 8-bit (int8) for 4:1 memory compression with minimal recall impact, as documented in vector search literature. Verify which specific quantization bit depths are supported by the damahua/hnswlib fork being used (likely 0 for disabled, 4-bit, or 8-bit). The value 4 mentioned may be fork-specific.
Example fix:
- Parse once into
Result<i32, _> - If
Ok(bits)and bits not in the supported set (verify: {0, 4} or {0, 4, 8}), returnErr(WrappedHnswInitError::Other(...))or a newInvalidArgumentvariant with a clear message listing valid values - If
Err(_), return a config error instead of silently defaulting
Context for Agents
`CHROMA_QUANTIZATION_BITS` is parsed with `parse().ok()` and then defaulted to 0. That silently disables quantization on typos or unsupported values (e.g., "four", "-1", "5") and makes misconfiguration hard to detect. Can you validate allowed values and return an error (or at least log/warn) when the env var is set but invalid?
Note: Common HNSW quantization implementations support 8-bit (int8) for 4:1 memory compression with minimal recall impact, as documented in vector search literature. Verify which specific quantization bit depths are supported by the damahua/hnswlib fork being used (likely 0 for disabled, 4-bit, or 8-bit). The value 4 mentioned may be fork-specific.
Example fix:
- Parse once into `Result<i32, _>`
- If `Ok(bits)` and bits not in the supported set (verify: {0, 4} or {0, 4, 8}), return `Err(WrappedHnswInitError::Other(...))` or a new `InvalidArgument` variant with a clear message listing valid values
- If `Err(_)`, return a config error instead of silently defaulting
File: rust/index/src/hnsw.rs
Line: 159
Summary
CHROMA_QUANTIZATION_BITS=4env var (default 0 = standard float32)Dependencies
Changes
This PR (2 files):
Cargo.toml: Point hnswlib dep to fork (temporary, see note above)rust/index/src/hnsw.rs: ReadCHROMA_QUANTIZATION_BITSenv var, pass to hnswlib inithnswlib PR (chroma-core/hnswlib#46, 5 files):
turbo_quant.h(new):TurboQuantizer,TurboQuantL2Space,TurboQuantIPSpacehnswalg.h:quantizer_field, quantized storage in addPoint/searchKnn/getDataByLabelhnswlib.h: Include turbo_quant.hbindings.cpp:create_index_quantized()FFI,set_quantizer()wiringhnsw.rs:quantization_bitsconfig fieldHow it works
A/B Test Results (N=3, 50K × 768-dim vectors, same binary)
39.7% peak RSS reduction. Distributions don't overlap. Zero errors across all 351 API requests per run.
Test plan
CHROMA_QUANTIZATION_BITS=0(default) has no behavior changeCHROMA_QUANTIZATION_BITS=4reduces memory with 50K+ vectors🤖 Generated with Claude Code