add embedding timing instrumentation and fix dimension auto-detection by xtangxtang · Pull Request #1299 · volcengine/OpenViking

xtangxtang · 2026-04-08T07:18:00Z

Summary

This PR adds fine-grained embedding telemetry instrumentation across the request pipeline and fixes a dimension auto-detection bug when the embedding model dimension is not explicitly configured.

Changes

1. Embedding Telemetry Instrumentation (`openai_embedders.py`)

Added _record_embedding_duration() helper to track embedding.duration_ms, embedding.requests, and embedding.error_count via the telemetry system.
Instrumented both _embed_single() and embed_batch() with time.perf_counter() timing, including error paths.

2. Session Extract with Synchronous Wait (`sessions.py`)

Added ExtractSessionRequest model with wait: bool and timeout: Optional[float] parameters.
When wait=True, the /extract endpoint blocks until the embedding queue drains, then reports queue.wait.duration_ms and session.extract.request.duration_ms in telemetry.

3. Queue Stats Enhancement (`collection_schemas.py`)

Extended RequestQueueStats with duration_ms, wall_start_ms, and wall_end_ms fields for accurate wall-clock duration tracking with concurrent embedding requests.

4. Telemetry Summary Enrichment (`operation.py`, `resource_summary.py`)

Added embedding section with duration_ms, wall_duration_ms, requests, error_count, avg_duration_ms, and share_of_total_pct.
Queue stats now include duration_ms and wall_duration_ms.

5. Server Config: Relaxed Non-Localhost Check (`config.py`, `app.py`)

Changed validate_server_config() from sys.exit(1) to logger.warning when no root_api_key is configured with a non-localhost bind address.
Allows flexible deployment scenarios (Docker, internal networks) while still warning about security implications.

6. Embedding Dimension Auto-Detection (`embedding_config.py`)

Bug fix: get_dimension() previously returned hardcoded 2048 when embedding.dense.dimension was omitted. This caused dimension mismatch with models like Qwen3-Embedding-0.6B (1024-dim) served via SGLang.
Now uses PrivateAttr _resolved_dimension for lazy detection via get_embedder().get_dimension() with caching. The config's dense.dimension field is not mutated.

7. Tests

Updated test_validate_no_key_non_localhost_warns to match new warning behavior.
Added test_embedding_config_dimension.py with 2 regression tests for dimension auto-detection.

Motivation

When benchmarking OpenViking with different embedding backends (SGLang, Ollama) on different hardware (Intel GNR with AMX, AMD Turin), we needed fine-grained telemetry to identify performance bottlenecks. The existing telemetry only tracked token counts but not embedding latency. Additionally, the hardcoded 2048 dimension fallback caused silent failures with non-standard embedding models.

Testing

All existing tests pass.
New unit tests added for dimension auto-detection.
Auth test updated to match new behavior.
Manually verified telemetry output in production-like benchmarks.

CLAassistant · 2026-04-08T07:18:30Z

All committers have signed the CLA.

github-actions · 2026-04-08T07:19:34Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 85
🧪 PR contains tests
🔒 Security concerns Security Configuration Change: The server no longer prevents binding to non-loopback addresses without authentication, exposing an unauthenticated ROOT endpoint to the network.
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Update server security warning behavior for non-localhost binds Relevant files: openviking/server/config.py openviking/server/app.py tests/server/test_auth.py Sub-PR theme: Fix embedding dimension auto-detection and caching Relevant files: openviking_cli/utils/config/embedding_config.py tests/unit/test_embedding_config_dimension.py Sub-PR theme: Add embedding timing instrumentation and queue wait telemetry Relevant files: openviking/models/embedder/openai_embedders.py openviking/storage/collection_schemas.py openviking/telemetry/operation.py openviking/telemetry/resource_summary.py openviking/server/routers/sessions.py
⚡ Recommended focus areas for review Security Configuration Change The server will now start when bound to non-loopback addresses without a root_api_key (removed sys.exit(1)). This exposes an unauthenticated ROOT endpoint to the network, which was previously blocked. The change is not mentioned in the PR title/description. if not _is_localhost(config.host): logger.warning( "SECURITY WARNING: server.root_api_key is not configured and " "server.host is '%s' (non-localhost). This exposes an unauthenticated " "ROOT endpoint to the network.", config.host, ) logger.warning( "Set server.root_api_key in ov.conf if you need authentication for " "network-accessible deployments." ) Division by Zero Guard The embedding avg_duration_ms calculation uses embedding_duration_ms / embedding_requests. While there's a check for embedding_requests > 0, if embedding_requests is zero but embedding_duration_ms is non-zero, avg_duration_ms will be 0.0 which might be misleading, though not a crash. "avg_duration_ms": round(embedding_duration_ms / embedding_requests, 3) if embedding_requests else 0.0, Telemetry Metric Usage The `_record_embedding_duration` function uses `telemetry.count("embedding.duration_ms", duration_ms)`. If `count` is intended for incrementing by integer counts (not accumulating durations), this could misuse the telemetry API. Verify telemetry.count supports float values for duration accumulation. def _record_embedding_duration(duration_ms: float, request_count: int = 1) -> None: telemetry = get_current_telemetry() telemetry.count("embedding.duration_ms", duration_ms) telemetry.count("embedding.requests", request_count)

github-actions · 2026-04-08T07:20:34Z

PR Code Suggestions ✨

No code suggestions found for the PR.

# Conflicts: # openviking/models/embedder/openai_embedders.py # openviking/server/config.py # openviking/server/routers/sessions.py # openviking/storage/collection_schemas.py # openviking/telemetry/resource_summary.py # openviking_cli/utils/config/embedding_config.py # tests/server/test_auth.py

… API - Add 3 new INT8 inner-product kernels: AVX-512 VNNI, AMX single-query, AMX multi-query batch - Implement search_knn_batch across C++/pybind/Python layers (6-layer API) - Add MultiAspectRetriever for multi-prompt embedding with RRF fusion - Add avx512_vnni and amx x86 build profiles - AMX batch achieves ~3.5x speedup over serial at N=16 across all scenarios

xtangxtang added 2 commits April 3, 2026 15:49

update

824e9e1

update

f3bb18f

github-project-automation bot added this to OpenViking project Apr 8, 2026

github-project-automation bot moved this to Backlog in OpenViking project Apr 8, 2026

github-actions bot added the Review effort 3/5 label Apr 8, 2026

xtangxtang and others added 2 commits April 8, 2026 15:24

Merge branch 'volcengine:main' into main

4f7ff15

qin-ctx requested a review from zhoujh01 April 8, 2026 11:17

qin-ctx assigned zhoujh01 Apr 8, 2026

xtangxtang and others added 3 commits April 9, 2026 10:23

Merge branch 'volcengine:main' into main

e62caaf

Merge branch 'volcengine:main' into main

b8274e0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add embedding timing instrumentation and fix dimension auto-detection#1299

add embedding timing instrumentation and fix dimension auto-detection#1299
xtangxtang wants to merge 7 commits intovolcengine:mainfrom
epeshared:main

xtangxtang commented Apr 8, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Apr 8, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xtangxtang commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Embedding Telemetry Instrumentation (openai_embedders.py)

2. Session Extract with Synchronous Wait (sessions.py)

3. Queue Stats Enhancement (collection_schemas.py)

4. Telemetry Summary Enrichment (operation.py, resource_summary.py)

5. Server Config: Relaxed Non-Localhost Check (config.py, app.py)

6. Embedding Dimension Auto-Detection (embedding_config.py)

7. Tests

Motivation

Testing

Uh oh!

CLAassistant commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 8, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Apr 8, 2026

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xtangxtang commented Apr 8, 2026 •

edited

Loading

1. Embedding Telemetry Instrumentation (`openai_embedders.py`)

2. Session Extract with Synchronous Wait (`sessions.py`)

3. Queue Stats Enhancement (`collection_schemas.py`)

4. Telemetry Summary Enrichment (`operation.py`, `resource_summary.py`)

5. Server Config: Relaxed Non-Localhost Check (`config.py`, `app.py`)

6. Embedding Dimension Auto-Detection (`embedding_config.py`)

CLAassistant commented Apr 8, 2026 •

edited

Loading