add AMX INT8 distance kernels with multi-query batch search API#1316
add AMX INT8 distance kernels with multi-query batch search API#1316xtangxtang wants to merge 7 commits intovolcengine:mainfrom
Conversation
# Conflicts: # openviking/models/embedder/openai_embedders.py # openviking/server/config.py # openviking/server/routers/sessions.py # openviking/storage/collection_schemas.py # openviking/telemetry/resource_summary.py # openviking_cli/utils/config/embedding_config.py # tests/server/test_auth.py
… API - Add 3 new INT8 inner-product kernels: AVX-512 VNNI, AMX single-query, AMX multi-query batch - Implement search_knn_batch across C++/pybind/Python layers (6-layer API) - Add MultiAspectRetriever for multi-prompt embedding with RRF fusion - Add avx512_vnni and amx x86 build profiles - AMX batch achieves ~3.5x speedup over serial at N=16 across all scenarios
…ation - Remove avx512_vnni build variant and inner_product_int8_avx512_vnni kernel - Single-query INT8 dispatch now uses batch_inner_product_int8_amx with num_vecs=1 - Remove _x86_avx512_vnni from Python engine variant mappings - Build variants: sse3, avx2, avx512, amx
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨No code suggestions found for the PR. |
|
Thank you for your code contribution. I'll review it. |
Summary
Add Intel AMX (Advanced Matrix Extensions) INT8 acceleration for vector distance computation in OpenViking's native engine backend. This includes two new AMX kernels for inner-product distance, a full-stack multi-query batch search API, and a new
MultiAspectRetrievermodule that leverages batch search for multi-perspective recall.Motivation
INT8 quantization reduces memory footprint by 4× compared to FP32 while maintaining search quality. Intel AMX provides hardware-accelerated matrix multiply for INT8 data via dedicated tile registers, enabling significant throughput improvements — especially when processing multiple queries simultaneously against the same database vectors.
Changes
1. AMX INT8 Distance Kernels (space_int8.h)
batch_inner_product_int8_amx: Single-query AMX kernel usingTDPBSSD(signed int8 × signed int8 → int32). Processes up to 16 database vectors simultaneously against one query vector using tile registers.batch_inner_product_int8_amx_multi_query: Multi-query AMX kernel. Computesdot(db[i], query[q])fori=0..15, q=0..15in a single tile operation per 64-dim chunk. Both DB vectors and query vectors share the AMX tile pipeline within each chunk iteration.2. Multi-Query Batch Search API (6-layer stack)
Full
search_knn_batchAPI propagated across all abstraction layers:bruteforce.hbrute_force_knn_batch_int8()with 16-vector tile blockingsearch_knn_batch()virtual interfacevector_index_adapter.hindex_manager_impl.{h,cpp}index_engine.{h,cpp}abi3_engine_backend.cpp_index_engine_search_batchIndexEngine.search_batch()3. AMX Build Variant
amxto x86 build profiles (build_support/x86_profiles.py, CMakeLists.txt)-mamx-tile -mamx-int8 -mavx512vnniflagsOV_ENGINE_VARIANT=amxor auto-detected4. MultiAspectRetriever (
openviking/retrieve/multi_aspect_retriever.py)New production module for multi-perspective recall that naturally utilizes
search_batch:AspectPrompt: frozen dataclass defining instruction prompt per aspectembed_multi_aspect(): generates N embedding vectors by prepending different instruction prompts to the same queryreciprocal_rank_fusion(): merges N ranked result lists into a single diverse rankingMultiAspectRetriever.retrieve(): end-to-end retrieve withmode="batch"|"serial"5. Python Engine Loader Updates
x86_amxvariant to engine module mappings, priority order, and display orderPerformance
Benchmarked on Intel Xeon 6983P-C (Granite Rapids), dim=256, nb=100K, single-thread:
AMX batch achieves a consistent ~3.5× speedup at N=16 across all scenarios.
Platform Requirements
XFEATURE_XTILEDATApermission viaarch_prctl)Files Changed (18 files, +965/-24)
Click to expand
build_support/x86_profiles.py— addamxvariantsrc/CMakeLists.txt— AMX build flagssrc/abi3_x86_caps.cpp— CPUID AMX detectionsrc/abi3_engine_backend.cpp— pybind11 batch bindingsrc/index/detail/vector/common/space_int8.h— AMX kernelssrc/index/detail/vector/common/bruteforce.h— batch brute-force searchsrc/index/detail/vector/common/vector_base.h— AMX SIMD definesrc/index/detail/vector/vector_index_adapter.h— adapter forwardingsrc/index/detail/index_manager_impl.{h,cpp}— manager batch APIsrc/index/index_engine.{h,cpp}— engine batch APIsrc/index/index_manager.h— abstract batch interfaceopenviking/storage/vectordb/engine/__init__.py— variant mappingsopenviking/storage/vectordb/engine/_python_api.py— Python batch APIopenviking/retrieve/multi_aspect_retriever.py— new modulesetup.py,pyproject.toml— build configWant me to go ahead and create this PR using the GitHub API?