Add Mixedbread (mxbai) model support #89

regenrek · 2025-11-23T20:48:05Z

Add Mixedbread (mxbai) model support

Why

Adds first-class support for Mixedbread models to provide users with an alternative embedding model optimized for local inference. Mixedbread models offer better performance characteristics (faster indexing, larger context windows) while maintaining high-quality semantic understanding, making them ideal for code search workloads.

What

Embedding model: mxbai-xsmall (mixedbread-ai/mxbai-embed-xsmall-v1)
- 384 dimensions, 4K context window, quantized ONNX model
- Fully local inference using ONNX Runtime
Reranker: mxbai (mixedbread-ai/mxbai-rerank-xsmall-v1)
- Neural cross-encoder reranker for improved result ranking
Provider abstraction: Clean architecture supporting multiple model providers (FastEmbed, Mixedbread)
CLI integration: --model mxbai-xsmall and --rerank-model mxbai flags
MCP support: Mixedbread models available in semantic/hybrid search tools

Performance Benchmarks

Hint: The Benchmark was made on a small repo. Don't take this for granted.

Model (ck2)	real (s)	user (s)	sys (s)	Notes
mxbai-xsmall (Mixedbread)	1.61	11.10	0.10	11 files indexed; FastEmbed 8192 token limit; chunk target 1024/overlap 200.
bge-small	6.52	47.54	0.29	11 files indexed; FastEmbed 512 token limit; chunk target 400/overlap 80.

Takeaways

Mixedbread/mxbai-xsmall finished ~4× faster wall-clock and ~4× lower CPU time for this repo, likely due to the larger FastEmbed token window (8192 vs 512) and bigger chunk size, which reduce chunk count and encoding calls.
bge-small brings a smaller embedding model and shorter chunk window, so it performs more encode cycles, trading speed for potentially higher-quality embeddings (depending on workload).
If you stick with Mixedbread, keep the .ck directory cached to avoid reindexing; switching back and forth between models requires either --switch-model or cleaning .ck, as done here.

Testing

✅ Integration tests for Mixedbread indexing and search
✅ Model switching tests
✅ Example test program (test_mixedbread.rs)
✅ Manual validation on real codebases

Documentation

Updated README.md with Mixedbread model information
Updated CHANGELOG.md
Added testing guides (TESTING_MIXEDBREAD.md, LOCAL_TESTING.md)
Updated docs-site with model comparison and usage examples

- Added new dependencies: `ck-models`, `hf-hub`, `tokenizers`, `ort`, `once_cell`, `ndarray`, and `num_cpus` to `ck-embed`. - Updated `Cargo.lock` to reflect new versions and dependencies. - Enhanced model resolution logic in `ck-engine` and `ck-embed` to support new models and improve error handling. - Refactored embedding and reranking model selection to utilize a unified registry approach. - Updated CLI help messages to include new model options. This commit lays the groundwork for better model management and integration across the project.

- Introduced `test_mixedbread_index_and_search` to validate indexing and semantic search functionality using the Mixedbread model. - Added `test_switch_model_to_mixedbread` to ensure model switching works correctly and updates the manifest as expected. - Both tests require the Mixedbread models to be downloaded, and are marked as ignored unless the environment variable `CK_MIXEDBREAD_MODELS_READY` is set. - Created temporary files with semantic content for testing purposes.

…documentation - Introduced first-class support for Mixedbread embedding and reranking models, including `mxbai-xsmall` and `mxbai`. - Updated README with usage instructions and model comparisons for Mixedbread. - Enhanced documentation to include specifications, pros, and cons of the Mixedbread model. - Improved FAQ and limitations sections to reflect new model options and requirements.

regenrek added 3 commits November 23, 2025 15:15

regenrek marked this pull request as ready for review November 23, 2025 20:54

regenrek mentioned this pull request Nov 27, 2025

Add mixedbread support #90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Mixedbread (mxbai) model support #89

Add Mixedbread (mxbai) model support #89

Uh oh!

regenrek commented Nov 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add Mixedbread (mxbai) model support #89

Are you sure you want to change the base?

Add Mixedbread (mxbai) model support #89

Uh oh!

Conversation

regenrek commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add Mixedbread (mxbai) model support

Why

What

Performance Benchmarks

Takeaways

Testing

Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

regenrek commented Nov 23, 2025 •

edited

Loading