Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces API key model affinity routing, which allows the system to route requests to preferred inference providers based on the API key and model being used. This can improve performance by ensuring that requests are routed to the most appropriate provider for a given user and model. The changes include adding a new database table to store the affinity bindings, implementing advisory locks to prevent race conditions, and modifying the inference provider pool to prioritize the preferred provider URL. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
PR Review: API Key Model Affinity Routing - The overall design is sound. A few critical issues: (1) No FK constraint on api_key_id in V0049__add_api_key_model_affinity.sql - add REFERENCES api_keys(id) ON DELETE CASCADE. (2) Raw pointer identity via Arc::as_ptr in inference_provider_pool/mod.rs is fragile due to allocator address reuse - use an explicit u64 stable ID instead. (3) model_name logged at warn level alongside api_key_id in completions/mod.rs violates privacy policy. Minor: raise API_KEY_MODEL_AFFINITY_REFRESH_TIMEOUT_SECS from 2s to 5s. The double-checked locking and health partitioning logic are correct. Please address items 1 and 3 as blockers. |
|
Full review detail follows (split due to formatting constraints): |
|
Issue 1 - No FK constraint and no expired row cleanup. File: crates/database/src/migrations/sql/V0049__add_api_key_model_affinity.sql. api_key_id has no FK constraint. If an API key is deleted, its affinity rows are never cleaned up (expired rows persist as dead weight). There is also no periodic cleanup of expired rows - over time this table will accumulate stale entries. Suggested fix: add REFERENCES api_keys(id) ON DELETE CASCADE to api_key_id. Even without a separate cleanup job, the FK + cascade ensures orphaned rows do not accumulate after key deletion. |
There was a problem hiding this comment.
Code Review
This pull request introduces API key model affinity, allowing specific API keys to be consistently routed to the same inference provider for a given model. Key changes include adding a new api_key_model_affinity database table and a PgApiKeyModelAffinityRepository to manage these bindings, which uses advisory locks for concurrent creation. The CompletionServiceImpl now integrates this affinity logic, determining when to use preferred provider URLs and scheduling background refreshes for active bindings. The InferenceProviderPool was updated to store provider URLs and prioritize a preferred provider during selection. Review comments suggest improving the robustness of background affinity refresh tasks by considering tokio::spawn_blocking with timeouts for critical operations, adding comments to explain the advisory_lock_key generation logic, and enhancing tracing messages with specific error details for better debugging.
|
Issue 2 - Raw pointer identity for provider_urls map is fragile. File: crates/services/src/inference_provider_pool/mod.rs. The provider_ptr function uses raw allocation addresses (Arc::as_ptr) as stable identifiers. If a provider Arc is dropped during a discovery cycle and a new one is allocated at the same address (allocator reuse), the new provider would inherit the old URL in provider_urls before the next provider_url_updates write. The failure mode is brief incorrect routing (corrected on the next discovery cycle), but it is a subtle invariant to rely on. Consider using an explicit stable ID (a u64 counter assigned at provider creation, stored in the provider struct and accessible via a trait method). This eliminates the aliasing risk entirely. |
|
Issue 3 - model_name logged at warn level (production-visible) may violate privacy policy. File: crates/services/src/completions/mod.rs - schedule_provider_affinity_refresh. The warn-level log includes model_name alongside api_key_id. Per project policy, logging metadata that reveals customer activity is forbidden. The pairing of api_key_id + model_name reveals which models a specific customer is using. While model names are public, the association with an API key ID is customer activity metadata. Suggest removing model_name from warn-level logs (it can remain at debug level for troubleshooting). |
|
Minor note: API_KEY_MODEL_AFFINITY_REFRESH_TIMEOUT_SECS = 2 may cause noisy logs. If the DB is under load, the 2-second refresh timeout will fire frequently, generating warn-level log spam. Since the refresh is best-effort and has no user-visible impact on failure, consider raising the timeout to 5s, or downgrading the timeout branch to debug level. The double-checked locking pattern in get_or_create_active_provider_url is correct. Health-based partitioning correctly overrides stale affinity for demoted providers (partitioning runs after the preferred reorder, so a failed preferred provider ends up at the back). The e2e tests for concurrent lock behavior are well-designed. Issues found - please address the FK constraint (1), privacy logging (3), and consider the pointer identity concern (2). |
There was a problem hiding this comment.
Pull request overview
Adds shared, Postgres-backed API key ↔ model affinity routing as a best-effort hint to improve cache locality for vLLM chat completions across multiple cloud-api instances (while preserving the existing routing strategy as fallback).
Changes:
- Introduces
api_key_model_affinitytable +PgApiKeyModelAffinityRepositorywith advisory-lock serialization for concurrent misses. - Extends
InferenceProviderPoolto track discovered provider URLs and to prioritize a preferred URL during provider ordering. - Wires affinity selection + async refresh into chat completion (streaming and non-streaming) paths, and adds E2E repository coverage for hit/miss locking behavior.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/services/src/inference_provider_pool/mod.rs | Tracks provider URL per provider instance and allows preferred URL prioritization; exposes provider URL selection for affinity creation. |
| crates/services/src/completions/ports.rs | Adds new ports for affinity repository and provider URL selection. |
| crates/services/src/completions/mod.rs | Uses affinity (when eligible) for chat completions and schedules best-effort async refresh after success. |
| crates/database/src/repositories/mod.rs | Registers the new repository module/export. |
| crates/database/src/repositories/api_key_model_affinity.rs | Implements Postgres repository with TTL, upsert, and pg_advisory_xact_lock for concurrent misses. |
| crates/database/src/migrations/sql/V0049__add_api_key_model_affinity.sql | Adds the api_key_model_affinity table. |
| crates/database/src/lib.rs | Re-exports the new Postgres repository type. |
| crates/api/tests/e2e_repositories.rs | Adds E2E coverage for concurrent miss lock serialization and hit-path lock avoidance. |
| crates/api/src/lib.rs | Wires the new affinity repository into CompletionServiceImpl initialization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
# Conflicts: # crates/services/src/completions/mod.rs # crates/services/src/inference_provider_pool/mod.rs
Implement shared API key/model affinity routing for vLLM-backed chat completions using Postgres.
This change adds a short-lived
(api_key_id, canonical_model_name) -> provider_urlbinding so requests from differentcloud-apiinstances can reuse the same model node and improve backend cache locality without introducing Redis. The existing routing strategy remains the fallback selector; affinity is a higher-level shared hint.Key changes:
model_pub_keyrequests on the existing pathprovider_urlwhen it is available locally, otherwise fall back to the current routing strategypg_advisory_xact_lock, while keeping cache hits lock-free