Skip to content

DAR-345: enforce public model pools#432

Open
Gajesh2007 wants to merge 5 commits into
masterfrom
dar-345-model-pools
Open

DAR-345: enforce public model pools#432
Gajesh2007 wants to merge 5 commits into
masterfrom
dar-345-model-pools

Conversation

@Gajesh2007

@Gajesh2007 Gajesh2007 commented Jun 20, 2026

Copy link
Copy Markdown
Member

Summary

  • add coordinator-managed public model pool assignment and enforce it in routing, preflight, cold dispatch, warm pool load_model, capacity snapshots, alias routability, and desired-model surfaces
  • return uptime-neutral 429 pool_exhausted when a requested model's assigned pool has no capacity, instead of spilling into another model's machines
  • add pool observability to admin utilization, Datadog gauges, /v1/me/providers, and update provider default max_model_slots to 1

Before/After

Before: a Mac could advertise both GPT-OSS and Gemma 4, so the coordinator could route either model to the same machine and opportunistically cold-load the other model. That made GPT-OSS and Gemma compete for one GPU/unified-memory/KV/thermal budget.

flowchart LR
    openrouter[OpenRouter request] --> coordinator[Coordinator routing]
    coordinator --> advertised{Provider advertises requested model?}
    advertised -->|GPT-OSS request| mac[Same Mac: GPT-OSS + Gemma advertised]
    advertised -->|Gemma 4 request| mac
    mac --> coldLoad[May cold-load / keep both public models warm]
    coldLoad --> coresident[GPT-OSS and Gemma 4 co-resident]
    coresident --> pressure[Shared GPU, unified memory, KV cache, thermal budget]
    pressure --> failures[503 / cancel / TTFT risk leaks across models]
Loading

After: each public provider is assigned to one active model pool at a time. A provider can serve many concurrent requests, but only for that assigned model. If it is idle/unloaded, the coordinator can safely reassign it to another pool on demand; if it is resident or busy, cross-pool requests do not spill into it.

flowchart LR
    openrouter[OpenRouter request] --> model{Requested model}
    model -->|GPT-OSS| gptPool[GPT-OSS pool]
    model -->|Gemma 4| gemmaPool[Gemma 4 pool]

    gptPool --> gptCheck{GPT-OSS pool has capacity?}
    gemmaPool --> gemmaCheck{Gemma 4 pool has capacity?}

    gptCheck -->|yes| gptRoute[Route to GPT-OSS-assigned provider]
    gemmaCheck -->|yes| gemmaRoute[Route to Gemma-assigned provider]

    gptCheck -->|no| gptIdle{Idle/unloaded provider can safely reassign?}
    gemmaCheck -->|no| gemmaIdle{Idle/unloaded provider can safely reassign?}

    gptIdle -->|yes| reassignGPT[Assign provider to GPT-OSS, load/warm GPT-OSS]
    gemmaIdle -->|yes| reassignGemma[Assign provider to Gemma 4, load/warm Gemma]

    gptIdle -->|no| gpt429[429 pool_exhausted + Retry-After]
    gemmaIdle -->|no| gemma429[429 pool_exhausted + Retry-After]

    gpt429 --> openrouterRetry[OpenRouter retries/fails over without uptime penalty]
    gemma429 --> openrouterRetry
Loading

Tests

  • go test ./... (coordinator)
  • swift test --filter ConfigTests (provider-swift)
  • git diff --check

Notes

  • Pool changes are safe only when a provider is idle/unloaded in this PR. Draining an actively resident provider from one pool into another remains a follow-up.
  • Existing provider configs with explicit max_model_slots > 1 remain opt-in until migrated.

Linear: https://linear.app/darkbloom-dev/issue/DAR-345/coordinatorprovider-enforce-one-active-public-model-per-machine-via


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.

@vercel

vercel Bot commented Jun 20, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
d-inference Ready Ready Preview Jun 20, 2026 10:49pm
d-inference-console-ui-dev Ready Ready Preview Jun 20, 2026 10:49pm
d-inference-landing Ready Ready Preview Jun 20, 2026 10:49pm

Request Review

@github-actions

github-actions Bot commented Jun 20, 2026

Copy link
Copy Markdown

This PR introduces coordinator-side model pool enforcement and a maxModelSlots default reduction; neither change directly weakens any mitigations, but the pool enforcement code touches the T-034 routing gate in ways that deserve a close second look, and one new attack surface is introduced.


Trust Boundaries Touched

  • TB-002 — Coordinator ↔ Provider WebSocket (registry routing gate, SendLoadModel, MarkModelWarm)
  • TB-003 — Provider operator vs. process (ProviderConfig.swift default change)
  • TB-006 — Admin → Coordinator (new SetModelPoolEnforcement config path in main.go)
  • TB-007 — Provider inference engine (model slot / capacity gating)

Threat Assessment

T-034 — Provider runs modified code while advertising a trusted identity
coordinator/registry/registry.go

⚠️ Partial concern — routing gate now has an additional bypass path.

providerSupportsPrivateTextLocked (the CodeAttested gate) is the single chokepoint for private-text routing per the threat model. The new providerAssignedOrIdleReassignableToModelPoolLocked checks are inserted into HasVisionProviderForModel (line ~2187), modelLoadCandidatePendingLocked (line ~3078), MarkModelWarm (line ~3175), and DesiredModelsForProvider (line ~2860), but critically also into SendLoadModel (line ~2694). None of these are the privacy gate, so the CodeAttested invariant is not directly broken. However:

  • enforceModelPools defaults to true in New() (line ~1169) but is toggled by SetModelPoolEnforcement(cfg.RegistryCfg.ModelPoolsEnabled) in main.go. If ModelPoolsEnabled is false (e.g. a misconfigured deploy or a feature-flag rollback), the pool gate is entirely bypassed for all routing calls. There is no visible fallback enforcement. Confirm that ModelPoolsEnabled=false is not the production default and cannot be reached without an explicit config change.
  • The assignProviderModelPool call inserted at the end of Heartbeat() (line ~2671) modifies p.AssignedPool asynchronously — the same field read under p.mu.Lock() in the routing functions. Verify there is no TOCTOU window where a provider is re-assigned a pool between the eligibility read and the actual dispatch decision.

T-011 — Provider operator reads X25519 private key from config file
provider-swift/Sources/ProviderCore/Config/ProviderConfig.swift lines 80, 106, 189

ℹ️ Neutral. maxModelSlots default reduced from 3 → 1. This is a resource/scheduling change. The config file still contains no cryptographic secrets (per TB-003 description), so T-011 posture is unchanged. The comment update at line ~3196 ("simultaneously when explicitly configured") is consistent.

T-032 — Solana wallet mnemonic exfiltration
coordinator/cmd/coordinator/main.go

ℹ️ Neutral. The only change in main.go is wiring SetModelPoolEnforcement. No new environment variable reads, no new secret handling. T-032 posture unchanged.

T-038 — Slow-header / server config
coordinator/cmd/coordinator/main.go

ℹ️ Neutral. HTTP server config block not touched in this diff.

T-041 — Profile signing identity (also main.go)
ℹ️ Neutral. KMS injection / profile signing code not touched.


New Attack Surface Not Covered by Existing Threats

Model pool assignment as a denial-of-routing primitive

assignProviderModelPool runs on every Heartbeat() and can change p.AssignedPool. There is currently no threat in the model covering a malicious provider that sends crafted heartbeat messages to influence its own pool assignment and thereby steer itself into (or out of) serving specific models. For example:

  • If pool assignment is derived from provider-supplied fields (model list, hardware info, etc.), a provider that lies about its capabilities might land in a pool it would not normally be assigned to, circumventing model-level routing isolation.
  • If AssignedPool == "" is a valid state that causes providerAssignedOrIdleReassignableToModelPoolLocked to return true by default (idle-reassignable path), a provider that never triggers pool assignment could remain eligible for any model.

Please confirm: (1) what inputs drive assignProviderModelPool / modelPoolKeyLocked, and (2) what the return value of providerAssignedOrIdleReassignableToModelPoolLocked is when AssignedPool is empty and enforceModelPools=true. If empty pool → eligible for everything, that's a routing isolation hole.


Lock-ordering note (not a threat model item, but a correctness risk that could affect security invariants)

In MarkModelWarm the diff changes the lock acquisition order: r.mu.RLock() is now held while p.mu.Lock() is acquired (lines ~3167–3180), and r.mu.RUnlock() is deferred until after p.mu.Unlock() is called. The registry-level lock is released inside the provider lock scope (r.mu.RUnlock() at line ~3181, before defer p.mu.Unlock() fires). Cross-check the rest of the codebase to confirm this is the established lock ordering (r.mup.mu) and that no other path takes p.mu before r.mu, which would introduce a deadlock that could be triggered by a malicious provider sending rapid heartbeats and load-model responses concurrently.


Open Findings Resolved

None — this PR does not address any SEC-* finding in the threat model.


🔐 Threat model: docs/threat-model.yaml · Updates on each push to this PR

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c636354643

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coordinator/registry/model_pools.go Outdated
Comment on lines +152 to +156
singleAdvertised := ""
for _, model := range p.Models {
if r.modelCanSeedPoolLocked(model.ID) && r.modelAllowedByCatalogLocked(model) {
if singleAdvertised != "" {
return ""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Seed pools by pool key, not raw model count

When a cold provider advertises more than one catalog model, this returns an empty seed even if those models are the desired/previous builds of the same public alias (the case the new pool model says should be one pool). With no CurrentModel, warm models, or loaded backend slot at startup, providerAssignedToModelPoolLocked then rejects the provider for both builds, so it cannot be selected or receive load_model for that alias until something else warms it. Compare advertised models by modelPoolKeyLocked(model.ID) and allow the seed when all advertised models collapse to the same pool.

Useful? React with 👍 / 👎.

Comment on lines +57 to +62
for alias, target := range r.modelAliases {
if target.Desired == model || target.Previous == model {
if best == "" || alias < best {
best = alias
}
continue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep shared alias builds routable in each owning pool

When the same concrete build belongs to multiple aliases, this collapses it to the lexicographically first alias only. A provider assigned to another alias that legitimately uses that build as its desired/previous target will fail providerAssignedToModelPoolLocked, so rollout fallback for that alias can report pool exhaustion/no provider even though its own providers advertise the build. The pool key needs alias context or to allow all owning aliases for shared builds.

Useful? React with 👍 / 👎.

Comment thread coordinator/registry/registry.go Outdated
Comment on lines +1811 to +1812
if !r.providerAssignedToModelPoolLocked(p, buildID, allowPrivate) {
return false

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Let pinned aliases reach pool-exhausted handling

For a serial-allowlisted alias request, this pool check makes ResolveModelConstrained return ok=false when the matching provider advertises the build but is assigned to another pool. The handler then exits during model resolution with 503 model_unavailable, so the new shedIfPoolExhausted path never gets a chance to produce the intended uptime-neutral 429. Treat pool mismatch separately from constrained build resolution and let preflight classify it.

Useful? React with 👍 / 👎.

Comment thread coordinator/registry/registry.go Outdated
Comment on lines +2703 to +2707
allowed := r.providerAssignedToModelPoolLocked(p, modelID, false)
p.mu.Unlock()
if !allowed {
r.mu.RUnlock()
return fmt.Errorf("provider %q is not assigned to model pool for %q", providerID, modelID)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Filter warm-pool candidates before load_model

This late rejection protects SendLoadModel, but warmPoolCandidateLocked still adds off-pool providers to eligibleCold before scoring. When a high-scored off-pool provider advertises the model, the warm-pool planner can reserve it, spend that tick's load budget, then hit this error and repeat on later ticks instead of warming a lower-scored provider that is actually assigned to the pool. Apply the same pool gate in the warm-pool candidate snapshot before selecting actions.

Useful? React with 👍 / 👎.

return false
}
traits := registry.RequestTraits{HasTools: hasTools}
if s.registry.PoolRejectedProviderCount(model, traits, requiresVision, allowedProviderSerials...) == 0 {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve tool and vision capability errors within pools

Because HasToolCapableProviderForModel and HasVisionProviderForModel only check whether any provider advertises the capability, an off-pool provider can make the earlier fail-fast pass even when the assigned pool has no tool/vision-capable provider. This new pool-exhausted check then returns a retryable 429 for a deterministic capability gap instead of the existing model_unavailable response; include pool assignment in those capability probes or avoid counting off-pool capability-only providers here.

Useful? React with 👍 / 👎.

@Gajesh2007

Copy link
Copy Markdown
Member Author

Addressed the Codex review feedback in 1a5822c:

  • seed multi-advertised providers by shared pool keys, so desired/previous builds in the same alias can seed one pool
  • allow shared concrete builds to match any owning alias pool
  • keep constrained alias resolution structural so pool mismatches reach the pool_exhausted 429 classifier
  • gate warm-pool candidates before reserving load_model actions
  • make tool/vision capability probes ignore off-pool providers

Verification:

  • go test ./registry
  • go test ./api
  • go test ./... (coordinator)
  • swift test --filter ConfigTests
  • git diff --check

@codex review

@Gajesh2007

Copy link
Copy Markdown
Member Author

Follow-up pushed in d678140:

  • removed assigned_pool/assigned_pool_providers from the unauthenticated /v1/models/capacity ModelCapacity JSON surface; pool topology remains available through admin utilization and authenticated /v1/me/providers
  • added tests pinning EIGENINFERENCE_MODEL_POOLS_ENABLED default-on and explicit-disable behavior

Verification:

  • go test ./registry
  • go test ./api
  • go test ./... (coordinator)
  • swift test --filter ConfigTests
  • git diff --check

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d678140aba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coordinator/registry/model_pools.go Outdated
Comment on lines +180 to +181
if len(commonPools) == 0 {
return ""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Assign a pool when advertised models are unrelated

When a provider reconnects with no warm/current model and advertises multiple unrelated catalog models, this path leaves AssignedPool empty. The cold-load paths then call providerAssignedToModelPoolLocked and reject that provider for every requested model, so warm-pool/queued-demand planning never sends the first load_model and the machine stays unroutable until something else manually loads a model. Pick a deterministic seed instead of leaving multi-model idle providers unassigned.

Useful? React with 👍 / 👎.

Comment thread coordinator/registry/registry.go Outdated
Comment on lines +2859 to +2860
if assignedPool != "" && assignedPool != r.modelPoolKeyLocked(t.Desired) {
continue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor the model-pool kill switch for desired builds

When EIGENINFERENCE_MODEL_POOLS_ENABLED=false, routing/load gates bypass pool checks, but this desired-model filter still drops aliases outside the provider's assigned pool. In a rollback/disable scenario, a multi-slot provider already assigned to pool A but advertising the previous build for alias B is allowed to serve B, yet it will not receive B's desired_build/prefetch instruction and can remain stranded on retired builds. Gate this filter on pool enforcement too.

Useful? React with 👍 / 👎.

self.enabledModels = try container.decodeIfPresent([String].self, forKey: .enabledModels) ?? []
self.idleTimeoutMins = try container.decodeIfPresent(UInt64.self, forKey: .idleTimeoutMins) ?? 60
self.maxModelSlots = try container.decodeIfPresent(UInt64.self, forKey: .maxModelSlots) ?? 3
self.maxModelSlots = try container.decodeIfPresent(UInt64.self, forKey: .maxModelSlots) ?? 1

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep slot defaults aligned with cold dispatch

Existing configs that omit max_model_slots now become single-slot providers, but the coordinator still treats an advertised cold model as dispatchable while the provider is already running a different in-pool build whenever memory admits. During alias rollouts, a provider can advertise the desired build while the previous build is still active; routing may send desired-build traffic to it, and ensureModelLoaded rejects with All 1 model slot(s) are active instead of waiting or falling back. Either keep the old default until routing understands the slot cap, or add a coordinator-side busy/full-slot gate for cold loads.

Useful? React with 👍 / 👎.

if len(keys) == 0 {
return ""
}
return keys[0]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid pinning shared builds to the first alias

When a concrete build belongs to multiple aliases, collapsing its pool list to keys[0] permanently assigns a provider seeded only by that shared build to whichever alias sorts first. A returning provider that advertises the shared build as alias B's previous/retired member can therefore be assigned to alias A, causing DesiredModelsForProvider to skip alias B's desired build and leaving that provider unable to converge or serve B's pool. Preserve the relevant alias membership instead of using the lexicographic first key for assignment decisions.

Useful? React with 👍 / 👎.

@Gajesh2007

Copy link
Copy Markdown
Member Author

Addressed the latest Codex findings in 25fdd11:

  • assign deterministic pools for unrelated multi-advertised idle providers instead of leaving them unassigned
  • preserve all owning alias memberships for shared concrete builds in assignment decisions
  • honor EIGENINFERENCE_MODEL_POOLS_ENABLED=false in DesiredModelsForProvider filtering
  • add a coordinator-side busy-other-slot capacity gate for cold dispatch to avoid single-slot provider load rejects during alias rollout

Verification:

  • go test ./registry
  • go test ./api
  • go test ./... (coordinator)
  • swift test --filter ConfigTests
  • git diff --check

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25fdd1138b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if len(commonPools) > 1 {
return firstAdvertised
}
return firstSortedMapKey(candidatePools)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Distribute idle multi-model providers across candidate pools

When an idle provider advertises unrelated pools and has no current/warm slot, this always picks the lexicographically first pool. If the fleet has several identical idle providers advertising models A and B, every connection is assigned to A, so requests for B see only pool-rejected providers and get pool_exhausted even though cold-capable machines exist. Use a stable provider-specific choice or another balancing signal instead of a global first key.

Useful? React with 👍 / 👎.

Comment on lines +2852 to +2854
if r.enforceModelPools {
r.assignProviderModelPoolLocked(p)
assignedPoolKeys = r.modelPoolKeysLocked(p.AssignedPool)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Defer advertised-only pool seeding until after heartbeat

When aliases are configured, the existing post-register path sends desired_models before the server processes any queued heartbeat, and this call now assigns AssignedPool from registration inventory alone. Because assignProviderModelPoolLocked returns early once AssignedPool is non-empty, a reconnecting provider already serving model B can be frozen to model A before its first heartbeat reports the active slot, making it unroutable for the model it is actually serving. Avoid mutating the pool here unless live state is present, or allow the first heartbeat to override advertised-only seeds.

Useful? React with 👍 / 👎.

Comment on lines +1015 to +1017
if !snap.modelLoaded && snap.coldLoadBlockedByBusySlot {
return nil, rejectCapacity, false
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't block cold loads on multi-slot providers with spare slots

This rejects any cold request whenever a different slot is busy, but providers explicitly configured with backend.max_model_slots > 1 can load another model without evicting that running slot until their slot cap is full. ProviderLoop.ensureModelLoaded only rejects when modelSlots.count >= maxModelSlots and no slot is evictable, so opt-in multi-slot providers now lose cold-dispatch capacity for a second advertised model while one slot is running. Gate this on an actual full slot cap instead of any busy other slot.

Useful? React with 👍 / 👎.

Comment on lines +1997 to +1999
if candidateCount == 0 && capacityRejections == 0 && modelTooLarge == 0 &&
s.shedIfPoolExhausted(w, r, parsed, publicModel, model, stream, estimatedPromptTokens, requestedMaxTokens, requiresVision, hasTools, allowedProviderSerials, refundReservation) {
return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Try alias fallback before shedding pool-exhausted aliases

When an alias resolves to Desired because some provider advertises that build, but every Desired provider is rejected only by the new pool gate, this branch returns pool_exhausted before checking whether the alias's Previous build has routable capacity. During a rollout or shared-build pool transition, that makes the public alias 429 even though Previous could still serve the request. Run the existing previous-build fallback for this zero-candidate/pool-rejected case before shedding.

Useful? React with 👍 / 👎.

Comment on lines +113 to +115
if p.AssignedPool != "" {
return r.modelPoolKeyLocked(p.AssignedPool)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Recompute alias-seeded pools after alias removal

If a provider is seeded to an alias name, this early return keeps using that alias forever. When an admin deletes or deactivates that alias while the provider stays connected, modelPoolKeyLocked(oldAlias) resolves to the stale alias string, so none of the concrete models the provider still advertises overlap its assigned pool and the provider is rejected until it reconnects. Invalidate or recompute AssignedPool when the alias topology no longer owns the seed.

Useful? React with 👍 / 👎.

@Gajesh2007

Copy link
Copy Markdown
Member Author

Updated behavior per review/expectations in afe8135:

  • safe idle/unloaded providers can now be reassigned to the requested pool on demand
  • providers with active or resident slots stay pinned, so they only serve inference for one public model at a time
  • cold dispatch now treats busy other slots as capacity pressure instead of sending a request the single-slot provider will reject
  • PR diagrams now show both GPT-OSS and Gemma 4 and the OpenRouter 429 pool_exhausted path

Verification:

  • go test ./registry
  • go test ./api
  • go test ./... (coordinator)
  • swift test --filter ConfigTests
  • git diff --check

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: afe8135361

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +247 to +250
func (r *Registry) providerPoolReassignableLocked(p *Provider) bool {
if p.pendingCount() != 0 {
return false
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep pending loads pinned to their pool

When a load_model is in flight, the provider is still unloaded and has no coordinator pending requests, so this predicate returns reassignable and a later preflight/request for another advertised model can rewrite AssignedPool. If the original load then succeeds, MarkModelWarm drops the success because the pool no longer matches, and the next heartbeat leaves a resident slot pinned to the wrong pool, stranding queued requests until reconnect/timeout. Treat providerHasPendingLoad(p.ID) as non-reassignable before allowing pool changes.

Useful? React with 👍 / 👎.

Comment on lines +1015 to +1016
if !snap.modelLoaded && snap.coldLoadBlockedByBusySlot {
return nil, rejectCapacity, false

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't block cold loads into free multi-slot capacity

For operators who explicitly keep backend.max_model_slots > 1, the provider can load another build without evicting the busy slot when it still has slot/memory headroom (ProviderLoop.ensureModelLoaded only rejects once modelSlots.count >= maxModelSlots and nothing is evictable). This unconditional rejection treats any busy other slot as capacity pressure, so a provider running an alias Previous build cannot cold-load Desired into its second slot and the coordinator queues/429s until the previous request drains despite available configured capacity. Gate this on a full/single-slot condition or a provider-reported slot cap instead of any busy slot.

Useful? React with 👍 / 👎.

Comment on lines +2851 to +2854
var assignedPoolKeys []string
if r.enforceModelPools {
r.assignProviderModelPoolLocked(p)
assignedPoolKeys = r.modelPoolKeysLocked(p.AssignedPool)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Don't assign pools before the first heartbeat

The provider registration path immediately calls DesiredModelsForProvider for current Swift providers, before any heartbeat has reported active_model or backend slots. With this assignment here, a reconnecting provider that is already resident for a later-advertised model can be pinned from advertised order instead; the following heartbeat won't correct a non-empty AssignedPool, and because the slot is resident it is no longer reassignable, so the loaded model is treated as off-pool and can return pool_exhausted until unload/reconnect. Defer seeding here or allow the first heartbeat's resident state to replace an advertised-order seed.

Useful? React with 👍 / 👎.

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: REQUEST_CHANGES

Security — ✅ No issues found

Performance — 8 finding(s) (5 blocking)

  • 🟡 [MEDIUM] coordinator/registry/model_pools.go:334 — Map allocation without pre-sizing in hot path function
    • Suggestion: Pre-allocate candidatePools with estimated capacity: candidatePools := make(map[string]struct{}, len(p.Models))
  • 🟡 [MEDIUM] coordinator/registry/model_pools.go:340 — Map allocation without pre-sizing in hot path function
    • Suggestion: Pre-allocate commonPools with estimated capacity: commonPools := make(map[string]struct{}, len(keys))
  • 🟡 [MEDIUM] coordinator/registry/model_pools.go:355 — Map allocation without pre-sizing in hot path function
    • Suggestion: Pre-allocate next with estimated capacity: next := make(map[string]struct{}, len(commonPools))
  • 🔵 [INFO] coordinator/api/me_handlers.go:637 — Map allocation without pre-sizing
    • Suggestion: Pre-allocate seen map: seen := make(map[string]struct{}, len(capacity.Slots))
  • 🔵 [INFO] coordinator/api/me_handlers.go:665 — Map allocation without pre-sizing
    • Suggestion: Pre-allocate seen map: seen := make(map[string]struct{}, len(models))
  • 🔵 [INFO] coordinator/registry/model_pools.go:257 — Map allocation without pre-sizing
    • Suggestion: Pre-allocate seen map: seen := make(map[string]struct{}, len(a))
  • 🔴 [CRITICAL] coordinator/registry/model_pools.go:325 — Potential N+1 pattern calling modelPoolKeysLocked for each model in loop
    • Suggestion: Consider batching model pool key lookups or caching results to avoid repeated calls
  • 🔴 [CRITICAL] coordinator/registry/model_pools.go:346 — Potential N+1 pattern calling modelPoolKeysLocked for each model in loop
    • Suggestion: Consider batching model pool key lookups or caching results to avoid repeated calls

Type_diligence — 1 finding(s)

  • 🔵 [INFO] coordinator/api/me_handlers.go:94-99 — map[string]any used in function parameter for parsed JSON data
    • Suggestion: Consider defining a typed struct for the parsed request data instead of using map[string]any, or add a comment explaining why the generic map is necessary for JSON unmarshaling

Additive_complexity — 5 finding(s) (2 blocking)

  • 🔴 [CRITICAL] coordinator/registry/model_pools.go:1-457 — 457-line model pool enforcement system adds significant complexity for a feature that could be simpler
    • Suggestion: Consider if this could be implemented with a simpler approach - perhaps a single function that checks if a provider can serve a model based on pool assignment, rather than this extensive state management system
  • 🟡 [MEDIUM] coordinator/api/me_handlers.go:531-680 — Complex provider building logic with multiple helper functions that could be consolidated
    • Suggestion: The activePoolSeedsFromLive, modelPoolKeys, and buildMyProvider functions have overlapping responsibilities and could be simplified into fewer, clearer functions
  • 🔵 [INFO] coordinator/registry/scheduler.go:779-788 — Routing gate logic split between providerPassesRoutingGatesBeforePoolLockedEx and providerPassesRoutingGatesLockedEx
    • Suggestion: Consolidate the routing gate checks into a single function rather than splitting the logic across two functions with similar names
  • 🔵 [INFO] coordinator/registry/scheduler_test.go:1518-1880 — 362 lines of repetitive test setup code for model pool scenarios
    • Suggestion: Extract common test setup patterns into helper functions to reduce duplication across the 15+ similar test cases
  • 🔵 [INFO] coordinator/api/servability_gate.go:95-147 — shedIfPoolExhausted duplicates much of the structure and logic from shedIfUnservable
    • Suggestion: Extract common error response building logic into a shared helper function to reduce duplication between these two similar functions

14 finding(s) total, 7 blocking. Verdict: REQUEST_CHANGES.

🤖 Automated review by Centaur · DAR-186

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: REQUEST_CHANGES

Security — 1 finding(s) (1 blocking)

  • 🟡 [MEDIUM] coordinator/registry/model_pools.go:201 — Model pool assignment bypassed for private routes
    • Suggestion: Consider if private routes should also respect pool assignments for consistency, or document this intentional bypass

Performance — 5 finding(s) (3 blocking)

  • 🟡 [MEDIUM] coordinator/registry/model_pools.go:139-197 — O(n²) nested loops in providerModelPoolSeedLocked for pool key matching
    • Suggestion: Pre-compute pool key mappings or use a map-based lookup instead of nested iteration over model pools
  • 🔵 [INFO] coordinator/registry/model_pools.go:330-350 — PoolRejectedProviderCount iterates all providers without early termination
    • Suggestion: Add early termination when count reaches a reasonable threshold, or cache results for frequently queried models
  • 🟡 [MEDIUM] coordinator/registry/model_pools.go:354-430 — ModelPoolMetricsSnapshot recomputes pool keys for every provider
    • Suggestion: Cache modelPoolKeyLocked results or batch the pool key computations to avoid repeated map lookups
  • 🔵 [INFO] coordinator/api/me_handlers.go:626-635 — buildMyProvider calls registry methods multiple times for same provider
    • Suggestion: Batch the pool key lookups or cache intermediate results within the function scope
  • 🟡 [MEDIUM] coordinator/registry/scheduler.go:1620-1687 — quickCapacityCheck iterates providers and calls multiple registry methods per provider
    • Suggestion: Batch provider filtering or pre-compute eligibility checks to reduce per-provider method calls

Type_diligence — 2 finding(s)

  • 🔵 [INFO] coordinator/api/me_handlers.go:94-99 — myProvider struct uses map[string]any for parsed field
    • Suggestion: Replace map[string]any with a typed struct for the parsed field to improve type safety and prevent runtime type assertion errors
  • 🔵 [INFO] coordinator/api/servability_gate.go:96-99 — shedIfPoolExhausted function parameter uses map[string]any
    • Suggestion: Replace map[string]any parameter with a typed struct to improve type safety and prevent potential type confusion

Additive_complexity — 6 finding(s) (4 blocking)

  • 🔴 [CRITICAL] coordinator/registry/model_pools.go:1-457 — 457-line model pool enforcement system with complex assignment logic
    • Suggestion: Consider if this could be simplified to a basic provider-to-model mapping with fewer edge cases and assignment rules
  • 🟡 [MEDIUM] coordinator/registry/model_pools.go:94-105 — modelCanSeedPoolLocked duplicates catalog lookup pattern
    • Suggestion: Extract shared catalog validation logic used across multiple functions
  • 🟡 [MEDIUM] coordinator/registry/model_pools.go:134-198 — Complex pool seed selection with multiple fallback strategies
    • Suggestion: Simplify to one clear assignment strategy instead of cascading through CurrentModel, BackendCapacity slots, WarmModels, and advertised models
  • 🔵 [INFO] coordinator/api/me_handlers.go:612-680 — activePoolSeedsFromLive and modelPoolKeys repeat similar map/slice operations
    • Suggestion: Extract shared helper for deduplicating string slices from model lists
  • 🔵 [INFO] coordinator/registry/scheduler.go:779-788 — providerPassesRoutingGatesLockedEx wraps providerPassesRoutingGatesBeforePoolLockedEx
    • Suggestion: Consider inlining the wrapper or clarifying why both functions are needed
  • 🟡 [MEDIUM] coordinator/registry/model_pools.go:200-245 — Pool assignment logic mixed with routing eligibility checks
    • Suggestion: Separate pool assignment (data concern) from routing eligibility (business logic concern) into distinct functions

14 finding(s) total, 8 blocking. Verdict: REQUEST_CHANGES.

🤖 Automated review by Centaur · DAR-186

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants