Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -394,9 +394,13 @@ Scheduled recurring tasks. Each cron job gets a fresh short-lived channel with f
- Multiple cron jobs run independently on wall-clock schedules (or legacy intervals)
- Stored in the database, created via config, conversation, or programmatically
- Cron expressions execute against the resolved cron timezone for predictable local-time firing
- Persisted `next_run_at` cursor for deterministic restart behavior and missed-run fast-forwarding
- Claim-before-run scheduling so multi-process or restarted schedulers do not double-fire recurring jobs
- Run-once jobs use at-most-once claiming semantics and disable before execution starts
- Per-job `timeout_secs` to cap execution time
- Circuit breaker auto-disables after 3 consecutive failures
- Active hours support with midnight wrapping
- Execution and delivery outcomes are logged separately, with bounded retry/backoff for proactive sends

### Multi-Agent

Expand Down
49 changes: 32 additions & 17 deletions docs/content/docs/(features)/cron.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,17 +20,18 @@ Cron job "check-email" fires (every 30m, active 09:00-17:00)
→ Scheduler creates a fresh Channel
→ Channel receives the cron job prompt as a synthetic message
→ Channel runs the LLM loop (can branch, spawn workers, use tools)
→ Channel produces OutboundResponse::Text
→ Scheduler collects the response
→ Scheduler delivers it via MessagingManager::broadcast("discord", "123456789")
→ Channel produces the first user-visible OutboundResponse
→ Scheduler treats that response as the terminal delivery payload
→ Scheduler delivers the OutboundResponse via MessagingManager::broadcast_proactive(...)
→ Scheduler records execution status and delivery outcome in the log
→ Channel shuts down

Cron job "daily-summary" fires (every 24h)
→ Same flow, different prompt, different target
→ Runs independently even if "check-email" is still in-flight
```

If the channel produces no text output, nothing is delivered. No magic tokens, no special markers — if there's nothing to say, the cron job is silent.
If the channel produces no delivery response, nothing is delivered. Cron records that as execution success with delivery skipped.

## Storage

Expand All @@ -50,6 +51,9 @@ CREATE TABLE cron_jobs (
active_start_hour INTEGER,
active_end_hour INTEGER,
enabled INTEGER NOT NULL DEFAULT 1,
run_once INTEGER NOT NULL DEFAULT 0,
next_run_at TIMESTAMP,
timeout_secs INTEGER,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
```
Expand All @@ -64,7 +68,9 @@ CREATE TABLE cron_jobs (
| `active_start_hour` | Optional start of active window (0-23, 24h local time) |
| `active_end_hour` | Optional end of active window (0-23, 24h local time) |
| `enabled` | Flipped to 0 by the circuit breaker after consecutive failures |
| `run_once` | If 1, the job auto-disables after its first execution attempt |
| `run_once` | If 1, the job is claimed by disabling it before execution starts so the fire is at-most-once |
| `next_run_at` | Persisted scheduler cursor used for deterministic restart/claim behavior |
| `timeout_secs` | Optional per-job wall-clock timeout for the cron run |

### cron_executions

Expand All @@ -77,10 +83,17 @@ CREATE TABLE cron_executions (
executed_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
success INTEGER NOT NULL,
result_summary TEXT,
execution_succeeded INTEGER,
delivery_attempted INTEGER,
delivery_succeeded INTEGER,
execution_error TEXT,
delivery_error TEXT,
FOREIGN KEY (cron_id) REFERENCES cron_jobs(id) ON DELETE CASCADE
);
```

`success` remains as a backward-compatible aggregate flag. New rows also record whether the agent run succeeded, whether delivery was attempted, and whether proactive delivery actually succeeded.

## Delivery Targets

The `delivery_target` field uses the format `adapter:target`:
Expand All @@ -91,15 +104,15 @@ The `delivery_target` field uses the format `adapter:target`:
| `discord:987654321` | Send to a different Discord channel |
| `webhook:some-endpoint` | Send via webhook adapter |

The adapter name maps to a registered messaging adapter. The target string is adapter-specific — for Discord, it's a channel ID parsed to u64. Delivery goes through `MessagingManager::broadcast()`, which is the proactive (non-reply) message path.
The adapter name maps to a registered messaging adapter. The target string is adapter-specific — for Discord, it's a channel ID parsed to u64. Delivery goes through `MessagingManager::broadcast_proactive()`, which applies bounded retry/backoff for transient proactive-send failures.

## Creation Paths

Cron jobs enter the system three ways.

### 1. Config File

Defined in `config.toml` under an agent. Seeded into the database on startup (upsert — won't overwrite runtime changes to existing IDs).
Defined in `config.toml` under an agent. Jobs are seeded into the database on startup and existing IDs are upserted, so `config.toml` remains the source of truth for prompt, schedule, target, `enabled`, `run_once`, and timeout settings while preserving compatible persisted cursor state. In practice that means the `enabled` value from `config.toml` overrides any previously persisted disabled state on startup.

```toml
[[agents]]
Expand Down Expand Up @@ -150,7 +163,7 @@ The tool persists to the database and registers with the running scheduler immed

The tool also supports `list` (show all active cron jobs) and `delete` (remove by ID).

For one-time reminders, set `run_once: true` on create. The scheduler disables the job after the first execution attempt.
For one-time reminders, set `run_once: true` on create. The scheduler claims the fire by disabling the job before execution starts and clearing its persisted cursor, which gives at-most-once ownership across processes.

### 3. Programmatic

Expand All @@ -173,7 +186,7 @@ If active hours are not set, the cron job runs at all hours.

If a configured timezone is invalid, Spacebot logs a warning and falls back to server local timezone.

For cron-expression jobs, active hours are evaluated at fire time and can further gate delivery. For legacy interval jobs, active hours don't change tick cadence — ticks outside the window are skipped.
For cron-expression jobs, active hours are evaluated at fire time and can further gate execution. For legacy interval jobs, active hours don't change tick cadence — ticks outside the window are skipped.

## Circuit Breaker

Expand All @@ -184,9 +197,9 @@ If a cron job fails 3 consecutive times, it's automatically disabled:
3. The timer loop exits
4. A warning is logged

A "failure" is any error from `run_cron_job()` — LLM errors, channel failures, delivery failures. A successful execution (even one that produces no output) resets the failure counter to 0.
A "failure" for breaker purposes is still any terminal error from `run_cron_job()` — prompt dispatch failures, channel failures, timeouts, or delivery failures after retries. A successful execution, including a run that produces no delivery response, resets the failure counter to 0.

Disabled cron jobs are not loaded on restart (the store query filters `WHERE enabled = 1`). To re-enable a disabled cron job, update the database row directly or re-seed it from config with `enabled = true`.
Disabled cron jobs are filtered out by the store query on restart (`WHERE enabled = 1`), but config-defined jobs are seeded first. If `config.toml` sets a job's `enabled = true`, that upsert restores the persisted row before the scheduler reloads enabled jobs. For database-only jobs, re-enable by updating the row directly.

## Execution Flow

Expand All @@ -198,15 +211,17 @@ When the scheduler fires a cron job:

3. **Run** — The channel processes the message through its normal LLM loop. It can use all channel tools (reply, branch, spawn_worker, memory_save, etc).

4. **Collect** — The scheduler reads from the channel's `response_tx`. Text responses are collected. Status updates and stream events are ignored.
4. **Collect** — The scheduler reads from the channel's `response_tx`. The first user-visible delivery response becomes the terminal payload. Status updates and stream events are ignored.

5. **Timeout** — If the channel doesn't finish within `timeout_secs` (default 120s), it's aborted and logged as execution failure.

5. **Timeout** — If the channel doesn't finish within 120 seconds, it's aborted.
6. **Claim and cursor advance** — Recurring jobs advance `next_run_at` before execution starts. Run-once jobs disable themselves before execution starts. This gives deterministic ownership and prevents replay bursts after restart.

6. **Log** — The execution is recorded in `cron_executions` with success status and a summary of the output.
7. **Deliver** — If there is a delivery response, the Scheduler sends the `OutboundResponse` to the target via `MessagingManager::broadcast_proactive()`. Transient send failures retry with bounded backoff; permanent failures fail immediately. Unsupported proactive variants are treated as delivery failures, not silent skips.

7. **Deliver** — If there's non-empty text, it's sent to the delivery target via `MessagingManager::broadcast()`. If the output is empty, delivery is skipped.
8. **Log** — The execution is recorded in `cron_executions` with split execution and delivery outcomes plus any execution/delivery error text.

8. **Teardown** — The channel's sender is dropped after sending the prompt, so the channel's event loop exits naturally after processing the single message.
9. **Teardown** — The channel sender is dropped after sending the prompt, so the channel behaves as a one-shot conversation and exits naturally once processing completes.

## Scheduler Lifecycle

Expand All @@ -219,7 +234,7 @@ The scheduler is created per-agent after messaging adapters are initialized (it
5. Each cron job is registered, starting its timer loop
6. The `cron` tool is registered on the agent's `ToolServerHandle`

Timer loops skip the first tick — cron jobs wait one full interval before their first execution. This prevents a burst of activity on startup.
Timer loops now run off the persisted `next_run_at` cursor instead of relying only on in-memory interval state. On startup the scheduler initializes missing cursors, reloads stale state from the store when it loses a claim, and fast-forwards overdue recurring jobs instead of replaying every missed tick.

On shutdown, all timer handles are aborted.

Expand Down
3 changes: 2 additions & 1 deletion docs/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,8 @@ The `tier` label corresponds to the process type making the request: `channel`,

| Metric | Type | Labels | Description |
| ----------------------------------------------- | --------- | ----------------------------- | ----------------------------------- |
| `spacebot_cron_executions_total` | Counter | agent_id, task_type, result | Cron task executions |
| `spacebot_cron_executions_total` | Counter | agent_id, cron_id, result | Cron execution outcome only (`success`/`failure`) |
| `spacebot_cron_delivery_total` | Counter | agent_id, cron_id, result | Cron delivery outcome (`success`/`failure`/`skipped`) |
| `spacebot_ingestion_files_processed_total` | Counter | agent_id, result | Ingestion files processed |

## Useful PromQL Queries
Expand Down
13 changes: 11 additions & 2 deletions interface/src/api/client.ts
Original file line number Diff line number Diff line change
Expand Up @@ -735,16 +735,25 @@ export interface CronJobWithStats {
run_once: boolean;
active_hours: [number, number] | null;
timeout_secs: number | null;
success_count: number;
failure_count: number;
execution_success_count: number;
execution_failure_count: number;
delivery_success_count: number;
delivery_failure_count: number;
delivery_skipped_count: number;
last_executed_at: string | null;
}

export interface CronExecutionEntry {
id: string;
cron_id: string | null;
executed_at: string;
success: boolean;
execution_succeeded: boolean;
delivery_attempted: boolean;
delivery_succeeded: boolean | null;
result_summary: string | null;
execution_error: string | null;
delivery_error: string | null;
}

export interface CronListResponse {
Expand Down
18 changes: 15 additions & 3 deletions interface/src/api/schema.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2498,7 +2498,13 @@ export interface components {
};
/** @description Entry in the cron execution log. */
CronExecutionEntry: {
cron_id?: string | null;
delivery_attempted: boolean;
delivery_error?: string | null;
delivery_succeeded?: boolean | null;
executed_at: string;
execution_error?: string | null;
execution_succeeded: boolean;
id: string;
result_summary?: string | null;
success: boolean;
Expand Down Expand Up @@ -2529,18 +2535,24 @@ export interface components {
] | null;
cron_expr?: string | null;
delivery_target: string;
/** Format: int64 */
delivery_failure_count: number;
/** Format: int64 */
delivery_skipped_count: number;
/** Format: int64 */
delivery_success_count: number;
enabled: boolean;
/** Format: int64 */
failure_count: number;
execution_failure_count: number;
/** Format: int64 */
execution_success_count: number;
id: string;
/** Format: int64 */
interval_secs: number;
last_executed_at?: string | null;
prompt: string;
run_once: boolean;
/** Format: int64 */
success_count: number;
/** Format: int64 */
timeout_secs?: number | null;
};
CronListResponse: {
Expand Down
115 changes: 91 additions & 24 deletions interface/src/routes/AgentCron.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -233,8 +233,12 @@ export function AgentCron({ agentId }: AgentCronProps) {

const totalJobs = data?.jobs.length ?? 0;
const enabledJobs = data?.jobs.filter((j) => j.enabled).length ?? 0;
const totalRuns = data?.jobs.reduce((sum, j) => sum + j.success_count + j.failure_count, 0) ?? 0;
const failedRuns = data?.jobs.reduce((sum, j) => sum + j.failure_count, 0) ?? 0;
const totalRuns =
data?.jobs.reduce((sum, j) => sum + j.execution_success_count + j.execution_failure_count, 0) ?? 0;
const executionFailures =
data?.jobs.reduce((sum, j) => sum + j.execution_failure_count, 0) ?? 0;
const deliveryFailures =
data?.jobs.reduce((sum, j) => sum + j.delivery_failure_count, 0) ?? 0;

return (
<div className="flex h-full flex-col">
Expand All @@ -243,8 +247,9 @@ export function AgentCron({ agentId }: AgentCronProps) {
<div className="flex items-center gap-2 border-b border-app-line px-6 py-3">
<Badge variant="accent" size="md">{totalJobs} total</Badge>
<Badge variant="green" size="md">{enabledJobs} enabled</Badge>
<Badge variant="outline" size="md">{totalRuns} runs</Badge>
{failedRuns > 0 && <Badge variant="red" size="md">{failedRuns} failed</Badge>}
<Badge variant="outline" size="md">{totalRuns} executions</Badge>
{executionFailures > 0 && <Badge variant="red" size="md">{executionFailures} exec failed</Badge>}
{deliveryFailures > 0 && <Badge variant="red" size="md">{deliveryFailures} delivery failed</Badge>}
{data?.timezone && (
<span className="text-tiny text-ink-faint">tz: {data.timezone}</span>
)}
Expand Down Expand Up @@ -599,8 +604,9 @@ function CronJobCard({
isToggling: boolean;
isTriggering: boolean;
}) {
const totalRuns = job.success_count + job.failure_count;
const successRate = totalRuns > 0 ? Math.round((job.success_count / totalRuns) * 100) : null;
const totalRuns = job.execution_success_count + job.execution_failure_count;
const executionSuccessRate =
totalRuns > 0 ? Math.round((job.execution_success_count / totalRuns) * 100) : null;
const schedule = formatCronSchedule(job.cron_expr, job.interval_secs);

return (
Expand Down Expand Up @@ -648,11 +654,31 @@ function CronJobCard({
<span>ran {formatTimeAgo(job.last_executed_at)}</span>
</>
)}
{successRate !== null && (
{executionSuccessRate !== null && (
<>
<span className="text-ink-faint/50">·</span>
<span className={successRate >= 90 ? "text-green-500" : successRate >= 50 ? "text-yellow-500" : "text-red-500"}>
{successRate}% ({job.success_count}/{totalRuns})
<span
className={
executionSuccessRate >= 90
? "text-green-500"
: executionSuccessRate >= 50
? "text-yellow-500"
: "text-red-500"
}
>
exec {executionSuccessRate}% ({job.execution_success_count}/{totalRuns})
</span>
</>
)}
{(job.delivery_success_count > 0 ||
job.delivery_failure_count > 0 ||
job.delivery_skipped_count > 0) && (
<>
<span className="text-ink-faint/50">·</span>
<span className="text-ink-faint">
delivery {job.delivery_success_count} sent
{job.delivery_failure_count > 0 ? `, ${job.delivery_failure_count} failed` : ""}
{job.delivery_skipped_count > 0 ? `, ${job.delivery_skipped_count} skipped` : ""}
</span>
</>
)}
Expand Down Expand Up @@ -728,22 +754,63 @@ function JobExecutions({ agentId, jobId }: { agentId: string; jobId: string }) {

return (
<div className="flex flex-col gap-1">
{data.executions.map((execution) => (
<div
key={execution.id}
className="flex items-center gap-3 rounded-lg px-3 py-1.5"
>
<span className={`h-1.5 w-1.5 rounded-full ${execution.success ? "bg-green-500" : "bg-red-500"}`} />
<span className="text-tiny tabular-nums text-ink-faint">
{formatTimeAgo(execution.executed_at)}
</span>
{execution.result_summary && (
<span className="min-w-0 flex-1 truncate text-tiny text-ink-dull">
{execution.result_summary}
{data.executions.map((execution) => {
const statusTone = !execution.execution_succeeded
? "bg-red-500"
: execution.delivery_attempted && execution.delivery_succeeded === false
? "bg-yellow-500"
: execution.delivery_attempted && execution.delivery_succeeded === true
? "bg-green-500"
: "bg-gray-500";
const detail =
execution.delivery_error ?? execution.execution_error ?? execution.result_summary;
const deliveryLabel = !execution.delivery_attempted
? "no delivery"
: execution.delivery_succeeded === true
? "delivered"
: execution.delivery_succeeded === false
? "delivery failed"
: "delivery unknown";

return (
<div
key={execution.id}
className="flex items-center gap-3 rounded-lg px-3 py-1.5"
>
<span className={`h-1.5 w-1.5 rounded-full ${statusTone}`} />
<span className="text-tiny tabular-nums text-ink-faint">
{formatTimeAgo(execution.executed_at)}
</span>
)}
</div>
))}
<span
className={`rounded px-1.5 py-0.5 text-tiny ${
execution.execution_succeeded
? "bg-green-500/10 text-green-400"
: "bg-red-500/10 text-red-400"
}`}
>
{execution.execution_succeeded ? "exec ok" : "exec failed"}
</span>
<span
className={`rounded px-1.5 py-0.5 text-tiny ${
!execution.delivery_attempted
? "bg-app-lightBox text-ink-faint"
: execution.delivery_succeeded === true
? "bg-green-500/10 text-green-400"
: execution.delivery_succeeded === false
? "bg-yellow-500/10 text-yellow-300"
: "bg-app-lightBox text-ink-faint"
}`}
>
{deliveryLabel}
</span>
{detail && (
<span className="min-w-0 flex-1 truncate text-tiny text-ink-dull">
{detail}
</span>
)}
</div>
);
})}
</div>
);
}
3 changes: 3 additions & 0 deletions migrations/20260329105813_cron_next_run.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
-- Persist the scheduler cursor for each cron job so recurring schedules can
-- be claimed and advanced atomically before execution.
ALTER TABLE cron_jobs ADD COLUMN next_run_at TIMESTAMP;
Loading
Loading