This is the runtime-behavior deep dive for queue execution.
Use it with agent-queue-framework.md when you need exact semantics for worker provisioning, retries, timeouts, completion settlement, and websocket coupling.
Queue execution is generic and extension-driven.
- Core queue runtime:
apps/api/src/orchestrator-queue.ts - Typed job contracts/processors:
apps/api/src/orchestrator-types.ts,apps/api/src/orchestrator-processors.ts - API integration and endpoints:
apps/api/src/index.ts
Key property: extension handlers define workflow meaning; queue runtime defines execution guarantees.
- foreground user turns do not block on background queue work.
- every job reaches terminal state (
completed|failed|canceled). - retries are classification-based (
retryablevsfatal). - persisted recovery failures are explicit (not silently dropped).
- dedupe semantics are deterministic per job contract.
Worker sessions are owner+agent scoped.
- mapping key:
${ownerId}::${agent} ownerIdis project id orsession:<sessionId>for unassigned-chat ownership.- worker sessions are hidden from default user listing.
Lifecycle:
- resolve/create worker session lazily on first job.
- run one-time startup preflight.
- execute one instruction turn per queued job.
Startup preflight includes:
- one mandatory core queue-runner orientation turn.
- optional one-time extension bootstrap turn keyed by
bootstrapInstruction.key.
Queue worker completion prefers runtime notification streams first, with read-path fallback when needed.
Primary signals:
- runtime turn/item notifications for system-owned worker chats.
- supplemental read fallback (
thread/read(includeTurns)) in bounded windows.
Important behavior:
- include-turns materialization waits are grace-bounded.
- untrusted terminal snapshots require stable no-progress window before self-heal.
- empty running turns over grace threshold fail retryable to avoid phantom stalls.
Queue runtime supports:
- bounded attempts (
ORCHESTRATOR_QUEUE_MAX_ATTEMPTS) - per-job timeout (
ORCHESTRATOR_QUEUE_DEFAULT_TIMEOUT_MSor job override) - retry delay strategies, including immediate-first linear backoff for agent jobs.
Repository-default pattern for agent instruction jobs:
- attempt 1 retry delay:
0ms - later attempts:
+60mslinear increments
Suggest-request queue execution is completion-signal based and deadline bounded.
If no completion signal arrives within ORCHESTRATOR_SUGGEST_REQUEST_WAIT_MS:
- core writes deterministic fallback suggestion state
- interrupts worker turn best-effort
- terminalizes queue job without waiting indefinitely
This prevents hanging request UX when worker output cannot be observed in time.
Recovery behavior includes:
- stale mapped worker session reprovision (single reset+retry path)
- invalid/unknown persisted job payload terminalization with explicit failure reason
- in-flight request map cleanup on turn/session cleanup and runtime exit
Goal: no silent state drift and no unbounded queue growth due to malformed recovery input.
Queue lifecycle emits websocket events:
orchestrator_job_queuedorchestrator_job_startedorchestrator_job_progressorchestrator_job_completedorchestrator_job_failedorchestrator_job_canceled
Transcript side effects emit:
transcript_updated
UX design assumption:
- websocket drives live UI updates
- REST/read-path reconciliation handles missed-event recovery windows
Primary knobs:
- concurrency/capacity:
ORCHESTRATOR_QUEUE_GLOBAL_CONCURRENCY,ORCHESTRATOR_QUEUE_MAX_PER_PROJECT,ORCHESTRATOR_QUEUE_MAX_GLOBAL - retries/timeouts:
ORCHESTRATOR_QUEUE_MAX_ATTEMPTS,ORCHESTRATOR_QUEUE_DEFAULT_TIMEOUT_MS - worker settlement grace:
ORCHESTRATOR_AGENT_*
Tuning sequence:
- baseline with defaults.
- measure queue depth + job latency + memory.
- adjust one variable group at a time.
- validate with runtime smoke + queue-specific scenarios.
- confirm queue enabled in health state.
- inspect job state transitions (
queued/running/terminal). - inspect worker session transcript for instruction progress.
- verify extension handler emitted actionable output (enqueue/action result).
- verify settings/trust/RBAC policy did not block expected action path.
Use agent-queue-troubleshooting.md for concrete symptom-led playbooks.
- Queue framework foundation:
agent-queue-framework.md - Event/job payload contracts:
agent-queue-event-and-job-contracts.md - Queue troubleshooting playbook:
agent-queue-troubleshooting.md - Extension lifecycle controls:
agent-extension-lifecycle-and-conformance.md