spacedriveapp · vsumner · Apr 1, 2026 · Mar 28, 2026 · Mar 28, 2026 · Mar 28, 2026
diff --git a/README.md b/README.md
@@ -394,9 +394,13 @@ Scheduled recurring tasks. Each cron job gets a fresh short-lived channel with f
 - Multiple cron jobs run independently on wall-clock schedules (or legacy intervals)
 - Stored in the database, created via config, conversation, or programmatically
 - Cron expressions execute against the resolved cron timezone for predictable local-time firing
+- Persisted `next_run_at` cursor for deterministic restart behavior and missed-run fast-forwarding
+- Claim-before-run scheduling so multi-process or restarted schedulers do not double-fire recurring jobs
+- Run-once jobs use at-most-once claiming semantics and disable before execution starts
 - Per-job `timeout_secs` to cap execution time
 - Circuit breaker auto-disables after 3 consecutive failures
 - Active hours support with midnight wrapping
+- Execution and delivery outcomes are logged separately, with bounded retry/backoff for proactive sends
 
 ### Multi-Agent
 

diff --git a/docs/content/docs/(features)/cron.mdx b/docs/content/docs/(features)/cron.mdx
@@ -20,17 +20,18 @@ Cron job "check-email" fires (every 30m, active 09:00-17:00)
     → Scheduler creates a fresh Channel
     → Channel receives the cron job prompt as a synthetic message
     → Channel runs the LLM loop (can branch, spawn workers, use tools)
-    → Channel produces OutboundResponse::Text
-    → Scheduler collects the response
-    → Scheduler delivers it via MessagingManager::broadcast("discord", "123456789")
+    → Channel produces the first user-visible OutboundResponse
+    → Scheduler treats that response as the terminal delivery payload
+    → Scheduler delivers the OutboundResponse via MessagingManager::broadcast_proactive(...)
+    → Scheduler records execution status and delivery outcome in the log
     → Channel shuts down
 
 Cron job "daily-summary" fires (every 24h)
     → Same flow, different prompt, different target
     → Runs independently even if "check-email" is still in-flight
 ```
 
-If the channel produces no text output, nothing is delivered. No magic tokens, no special markers — if there's nothing to say, the cron job is silent.
+If the channel produces no delivery response, nothing is delivered. Cron records that as execution success with delivery skipped.
 
 ## Storage
 
@@ -50,6 +51,9 @@ CREATE TABLE cron_jobs (
     active_start_hour INTEGER,
     active_end_hour INTEGER,
     enabled INTEGER NOT NULL DEFAULT 1,
+    run_once INTEGER NOT NULL DEFAULT 0,
+    next_run_at TIMESTAMP,
+    timeout_secs INTEGER,
     created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
 );
 ```
@@ -64,7 +68,9 @@ CREATE TABLE cron_jobs (
 | `active_start_hour` | Optional start of active window (0-23, 24h local time) |
 | `active_end_hour` | Optional end of active window (0-23, 24h local time) |
 | `enabled` | Flipped to 0 by the circuit breaker after consecutive failures |
-| `run_once` | If 1, the job auto-disables after its first execution attempt |
+| `run_once` | If 1, the job is claimed by disabling it before execution starts so the fire is at-most-once |
+| `next_run_at` | Persisted scheduler cursor used for deterministic restart/claim behavior |
+| `timeout_secs` | Optional per-job wall-clock timeout for the cron run |
 
 ### cron_executions
 
@@ -77,10 +83,17 @@ CREATE TABLE cron_executions (
     executed_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
     success INTEGER NOT NULL,
     result_summary TEXT,
+    execution_succeeded INTEGER,
+    delivery_attempted INTEGER,
+    delivery_succeeded INTEGER,
+    execution_error TEXT,
+    delivery_error TEXT,
     FOREIGN KEY (cron_id) REFERENCES cron_jobs(id) ON DELETE CASCADE
 );
 ```
 
+`success` remains as a backward-compatible aggregate flag. New rows also record whether the agent run succeeded, whether delivery was attempted, and whether proactive delivery actually succeeded.
+
 ## Delivery Targets
 
 The `delivery_target` field uses the format `adapter:target`:
@@ -91,15 +104,15 @@ The `delivery_target` field uses the format `adapter:target`:
 | `discord:987654321` | Send to a different Discord channel |
 | `webhook:some-endpoint` | Send via webhook adapter |
 
-The adapter name maps to a registered messaging adapter. The target string is adapter-specific — for Discord, it's a channel ID parsed to u64. Delivery goes through `MessagingManager::broadcast()`, which is the proactive (non-reply) message path.
+The adapter name maps to a registered messaging adapter. The target string is adapter-specific — for Discord, it's a channel ID parsed to u64. Delivery goes through `MessagingManager::broadcast_proactive()`, which applies bounded retry/backoff for transient proactive-send failures.
 
 ## Creation Paths
 
 Cron jobs enter the system three ways.
 
 ### 1. Config File
 
-Defined in `config.toml` under an agent. Seeded into the database on startup (upsert — won't overwrite runtime changes to existing IDs).
+Defined in `config.toml` under an agent. Jobs are seeded into the database on startup and existing IDs are upserted, so `config.toml` remains the source of truth for prompt, schedule, target, `enabled`, `run_once`, and timeout settings while preserving compatible persisted cursor state. In practice that means the `enabled` value from `config.toml` overrides any previously persisted disabled state on startup.
 
 ```toml
 [[agents]]
@@ -150,7 +163,7 @@ The tool persists to the database and registers with the running scheduler immed
 
 The tool also supports `list` (show all active cron jobs) and `delete` (remove by ID).
 
-For one-time reminders, set `run_once: true` on create. The scheduler disables the job after the first execution attempt.
+For one-time reminders, set `run_once: true` on create. The scheduler claims the fire by disabling the job before execution starts and clearing its persisted cursor, which gives at-most-once ownership across processes.
 
 ### 3. Programmatic
 
@@ -173,7 +186,7 @@ If active hours are not set, the cron job runs at all hours.
 
 If a configured timezone is invalid, Spacebot logs a warning and falls back to server local timezone.
 
-For cron-expression jobs, active hours are evaluated at fire time and can further gate delivery. For legacy interval jobs, active hours don't change tick cadence — ticks outside the window are skipped.
+For cron-expression jobs, active hours are evaluated at fire time and can further gate execution. For legacy interval jobs, active hours don't change tick cadence — ticks outside the window are skipped.
 
 ## Circuit Breaker
 
@@ -184,9 +197,9 @@ If a cron job fails 3 consecutive times, it's automatically disabled:
 3. The timer loop exits
 4. A warning is logged
 
-A "failure" is any error from `run_cron_job()` — LLM errors, channel failures, delivery failures. A successful execution (even one that produces no output) resets the failure counter to 0.
+A "failure" for breaker purposes is still any terminal error from `run_cron_job()` — prompt dispatch failures, channel failures, timeouts, or delivery failures after retries. A successful execution, including a run that produces no delivery response, resets the failure counter to 0.
 
-Disabled cron jobs are not loaded on restart (the store query filters `WHERE enabled = 1`). To re-enable a disabled cron job, update the database row directly or re-seed it from config with `enabled = true`.
+Disabled cron jobs are filtered out by the store query on restart (`WHERE enabled = 1`), but config-defined jobs are seeded first. If `config.toml` sets a job's `enabled = true`, that upsert restores the persisted row before the scheduler reloads enabled jobs. For database-only jobs, re-enable by updating the row directly.
 
 ## Execution Flow
 
@@ -198,15 +211,17 @@ When the scheduler fires a cron job:
 
 3. **Run** — The channel processes the message through its normal LLM loop. It can use all channel tools (reply, branch, spawn_worker, memory_save, etc).
 
-4. **Collect** — The scheduler reads from the channel's `response_tx`. Text responses are collected. Status updates and stream events are ignored.
+4. **Collect** — The scheduler reads from the channel's `response_tx`. The first user-visible delivery response becomes the terminal payload. Status updates and stream events are ignored.
+
+5. **Timeout** — If the channel doesn't finish within `timeout_secs` (default 120s), it's aborted and logged as execution failure.
 
-5. **Timeout** — If the channel doesn't finish within 120 seconds, it's aborted.
+6. **Claim and cursor advance** — Recurring jobs advance `next_run_at` before execution starts. Run-once jobs disable themselves before execution starts. This gives deterministic ownership and prevents replay bursts after restart.
 
-6. **Log** — The execution is recorded in `cron_executions` with success status and a summary of the output.
+7. **Deliver** — If there is a delivery response, the Scheduler sends the `OutboundResponse` to the target via `MessagingManager::broadcast_proactive()`. Transient send failures retry with bounded backoff; permanent failures fail immediately. Unsupported proactive variants are treated as delivery failures, not silent skips.
 
-7. **Deliver** — If there's non-empty text, it's sent to the delivery target via `MessagingManager::broadcast()`. If the output is empty, delivery is skipped.
+8. **Log** — The execution is recorded in `cron_executions` with split execution and delivery outcomes plus any execution/delivery error text.
 
-8. **Teardown** — The channel's sender is dropped after sending the prompt, so the channel's event loop exits naturally after processing the single message.
+9. **Teardown** — The channel sender is dropped after sending the prompt, so the channel behaves as a one-shot conversation and exits naturally once processing completes.
 
 ## Scheduler Lifecycle
 
@@ -219,7 +234,7 @@ The scheduler is created per-agent after messaging adapters are initialized (it
 5. Each cron job is registered, starting its timer loop
 6. The `cron` tool is registered on the agent's `ToolServerHandle`
 
-Timer loops skip the first tick — cron jobs wait one full interval before their first execution. This prevents a burst of activity on startup.
+Timer loops now run off the persisted `next_run_at` cursor instead of relying only on in-memory interval state. On startup the scheduler initializes missing cursors, reloads stale state from the store when it loses a claim, and fast-forwards overdue recurring jobs instead of replaying every missed tick.
 
 On shutdown, all timer handles are aborted.
 

diff --git a/docs/metrics.md b/docs/metrics.md
@@ -121,7 +121,8 @@ The `tier` label corresponds to the process type making the request: `channel`,
 
 | Metric                                          | Type      | Labels                        | Description                         |
 | ----------------------------------------------- | --------- | ----------------------------- | ----------------------------------- |
-| `spacebot_cron_executions_total`                | Counter   | agent_id, task_type, result   | Cron task executions                |
+| `spacebot_cron_executions_total`                | Counter   | agent_id, cron_id, result     | Cron execution outcome only (`success`/`failure`) |
+| `spacebot_cron_delivery_total`                  | Counter   | agent_id, cron_id, result     | Cron delivery outcome (`success`/`failure`/`skipped`) |
 | `spacebot_ingestion_files_processed_total`      | Counter   | agent_id, result              | Ingestion files processed           |
 
 ## Useful PromQL Queries

diff --git a/interface/src/api/client.ts b/interface/src/api/client.ts
@@ -735,16 +735,25 @@ export interface CronJobWithStats {
 	run_once: boolean;
 	active_hours: [number, number] | null;
 	timeout_secs: number | null;
-	success_count: number;
-	failure_count: number;
+	execution_success_count: number;
+	execution_failure_count: number;
+	delivery_success_count: number;
+	delivery_failure_count: number;
+	delivery_skipped_count: number;
 	last_executed_at: string | null;
 }
 
 export interface CronExecutionEntry {
 	id: string;
+	cron_id: string | null;
 	executed_at: string;
 	success: boolean;
+	execution_succeeded: boolean;
+	delivery_attempted: boolean;
+	delivery_succeeded: boolean | null;
 	result_summary: string | null;
+	execution_error: string | null;
+	delivery_error: string | null;
 }
 
 export interface CronListResponse {

diff --git a/interface/src/api/schema.d.ts b/interface/src/api/schema.d.ts
@@ -2498,7 +2498,13 @@ export interface components {
         };
         /** @description Entry in the cron execution log. */
         CronExecutionEntry: {
+            cron_id?: string | null;
+            delivery_attempted: boolean;
+            delivery_error?: string | null;
+            delivery_succeeded?: boolean | null;
             executed_at: string;
+            execution_error?: string | null;
+            execution_succeeded: boolean;
             id: string;
             result_summary?: string | null;
             success: boolean;
@@ -2529,18 +2535,24 @@ export interface components {
             ] | null;
             cron_expr?: string | null;
             delivery_target: string;
+            /** Format: int64 */
+            delivery_failure_count: number;
+            /** Format: int64 */
+            delivery_skipped_count: number;
+            /** Format: int64 */
+            delivery_success_count: number;
             enabled: boolean;
             /** Format: int64 */
-            failure_count: number;
+            execution_failure_count: number;
+            /** Format: int64 */
+            execution_success_count: number;
             id: string;
             /** Format: int64 */
             interval_secs: number;
             last_executed_at?: string | null;
             prompt: string;
             run_once: boolean;
             /** Format: int64 */
-            success_count: number;
-            /** Format: int64 */
             timeout_secs?: number | null;
         };
         CronListResponse: {

diff --git a/interface/src/routes/AgentCron.tsx b/interface/src/routes/AgentCron.tsx
@@ -233,8 +233,12 @@ export function AgentCron({ agentId }: AgentCronProps) {
 
 	const totalJobs = data?.jobs.length ?? 0;
 	const enabledJobs = data?.jobs.filter((j) => j.enabled).length ?? 0;
-	const totalRuns = data?.jobs.reduce((sum, j) => sum + j.success_count + j.failure_count, 0) ?? 0;
-	const failedRuns = data?.jobs.reduce((sum, j) => sum + j.failure_count, 0) ?? 0;
+	const totalRuns =
+		data?.jobs.reduce((sum, j) => sum + j.execution_success_count + j.execution_failure_count, 0) ?? 0;
+	const executionFailures =
+		data?.jobs.reduce((sum, j) => sum + j.execution_failure_count, 0) ?? 0;
+	const deliveryFailures =
+		data?.jobs.reduce((sum, j) => sum + j.delivery_failure_count, 0) ?? 0;
 
 	return (
 		<div className="flex h-full flex-col">
@@ -243,8 +247,9 @@ export function AgentCron({ agentId }: AgentCronProps) {
 				<div className="flex items-center gap-2 border-b border-app-line px-6 py-3">
 					<Badge variant="accent" size="md">{totalJobs} total</Badge>
 					<Badge variant="green" size="md">{enabledJobs} enabled</Badge>
-					<Badge variant="outline" size="md">{totalRuns} runs</Badge>
-					{failedRuns > 0 && <Badge variant="red" size="md">{failedRuns} failed</Badge>}
+					<Badge variant="outline" size="md">{totalRuns} executions</Badge>
+					{executionFailures > 0 && <Badge variant="red" size="md">{executionFailures} exec failed</Badge>}
+					{deliveryFailures > 0 && <Badge variant="red" size="md">{deliveryFailures} delivery failed</Badge>}
 					{data?.timezone && (
 						<span className="text-tiny text-ink-faint">tz: {data.timezone}</span>
 					)}
@@ -599,8 +604,9 @@ function CronJobCard({
 	isToggling: boolean;
 	isTriggering: boolean;
 }) {
-	const totalRuns = job.success_count + job.failure_count;
-	const successRate = totalRuns > 0 ? Math.round((job.success_count / totalRuns) * 100) : null;
+	const totalRuns = job.execution_success_count + job.execution_failure_count;
+	const executionSuccessRate =
+		totalRuns > 0 ? Math.round((job.execution_success_count / totalRuns) * 100) : null;
 	const schedule = formatCronSchedule(job.cron_expr, job.interval_secs);
 
 	return (
@@ -648,11 +654,31 @@ function CronJobCard({
 								<span>ran {formatTimeAgo(job.last_executed_at)}</span>
 							</>
 						)}
-						{successRate !== null && (
+						{executionSuccessRate !== null && (
 							<>
 								<span className="text-ink-faint/50">·</span>
-								<span className={successRate >= 90 ? "text-green-500" : successRate >= 50 ? "text-yellow-500" : "text-red-500"}>
-									{successRate}% ({job.success_count}/{totalRuns})
+								<span
+									className={
+										executionSuccessRate >= 90
+											? "text-green-500"
+											: executionSuccessRate >= 50
+												? "text-yellow-500"
+												: "text-red-500"
+									}
+								>
+									exec {executionSuccessRate}% ({job.execution_success_count}/{totalRuns})
+								</span>
+							</>
+						)}
+						{(job.delivery_success_count > 0 ||
+							job.delivery_failure_count > 0 ||
+							job.delivery_skipped_count > 0) && (
+							<>
+								<span className="text-ink-faint/50">·</span>
+								<span className="text-ink-faint">
+									delivery {job.delivery_success_count} sent
+									{job.delivery_failure_count > 0 ? `, ${job.delivery_failure_count} failed` : ""}
+									{job.delivery_skipped_count > 0 ? `, ${job.delivery_skipped_count} skipped` : ""}
 								</span>
 							</>
 						)}
@@ -728,22 +754,63 @@ function JobExecutions({ agentId, jobId }: { agentId: string; jobId: string }) {
 
 	return (
 		<div className="flex flex-col gap-1">
-			{data.executions.map((execution) => (
-				<div
-					key={execution.id}
-					className="flex items-center gap-3 rounded-lg px-3 py-1.5"
-				>
-					<span className={`h-1.5 w-1.5 rounded-full ${execution.success ? "bg-green-500" : "bg-red-500"}`} />
-					<span className="text-tiny tabular-nums text-ink-faint">
-						{formatTimeAgo(execution.executed_at)}
-					</span>
-					{execution.result_summary && (
-						<span className="min-w-0 flex-1 truncate text-tiny text-ink-dull">
-							{execution.result_summary}
+			{data.executions.map((execution) => {
+					const statusTone = !execution.execution_succeeded
+						? "bg-red-500"
+						: execution.delivery_attempted && execution.delivery_succeeded === false
+							? "bg-yellow-500"
+							: execution.delivery_attempted && execution.delivery_succeeded === true
+								? "bg-green-500"
+								: "bg-gray-500";
+					const detail =
+						execution.delivery_error ?? execution.execution_error ?? execution.result_summary;
+					const deliveryLabel = !execution.delivery_attempted
+						? "no delivery"
+						: execution.delivery_succeeded === true
+							? "delivered"
+							: execution.delivery_succeeded === false
+								? "delivery failed"
+								: "delivery unknown";
+
+				return (
+					<div
+						key={execution.id}
+						className="flex items-center gap-3 rounded-lg px-3 py-1.5"
+					>
+						<span className={`h-1.5 w-1.5 rounded-full ${statusTone}`} />
+						<span className="text-tiny tabular-nums text-ink-faint">
+							{formatTimeAgo(execution.executed_at)}
 						</span>
-					)}
-				</div>
-			))}
+						<span
+							className={`rounded px-1.5 py-0.5 text-tiny ${
+								execution.execution_succeeded
+									? "bg-green-500/10 text-green-400"
+									: "bg-red-500/10 text-red-400"
+							}`}
+						>
+							{execution.execution_succeeded ? "exec ok" : "exec failed"}
+						</span>
+						<span
+							className={`rounded px-1.5 py-0.5 text-tiny ${
+								!execution.delivery_attempted
+									? "bg-app-lightBox text-ink-faint"
+									: execution.delivery_succeeded === true
+										? "bg-green-500/10 text-green-400"
+										: execution.delivery_succeeded === false
+											? "bg-yellow-500/10 text-yellow-300"
+											: "bg-app-lightBox text-ink-faint"
+							}`}
+						>
+							{deliveryLabel}
+						</span>
+						{detail && (
+							<span className="min-w-0 flex-1 truncate text-tiny text-ink-dull">
+								{detail}
+							</span>
+						)}
+					</div>
+				);
+			})}
 		</div>
 	);
 }
diff --git a/migrations/20260329105813_cron_next_run.sql b/migrations/20260329105813_cron_next_run.sql
@@ -0,0 +1,3 @@
+-- Persist the scheduler cursor for each cron job so recurring schedules can
+-- be claimed and advanced atomically before execution.
+ALTER TABLE cron_jobs ADD COLUMN next_run_at TIMESTAMP;