Skip to content

Spec: daemon-hotfix self-improvement plugin #81

Description

@0x4007

Problem

We want continuous self-improvement across UOS plugins without bloating the kernel. Today, the kernel logs plugin dispatch failures and skips workflow/check events to avoid loops, so a dedicated hotfix plugin cannot react to those failures.

Goals

  • Route actionable failure signals to a dedicated plugin (daemon-hotfix) with minimal kernel changes.
  • Create/update issues in the appropriate plugin repo with enough context to reproduce.
  • Optionally propose fixes (PRs) with tests, while remaining safe by default.
  • Keep kernel lean: routing + policy gates only.

Non-goals

  • No fully autonomous merges by default.
  • No attempt to modify UOS kernel logic beyond minimal event plumbing.
  • No brittle keyword/regex-only tool selection for AI-driven decisions.

High-level architecture

  1. Kernel detects plugin dispatch failures (HTTP worker or GitHub Action dispatch) and emits a synthetic event.
  2. daemon-hotfix subscribes to that synthetic event.
  3. daemon-hotfix triages, de-dupes, files issues, and optionally drafts PRs.
  4. PRs go through normal CI; auto-merge only if explicitly enabled and risk tier is low.

Minimal kernel changes (proposed)

  1. Synthetic failure event

    • Emit kernel.plugin_error (or similar) when a plugin dispatch fails.
    • Include enough metadata to find the failing plugin repo and reproduce.
  2. Optional workflow/check allowlist

    • Today workflow_*, check_*, check_run.*, check_suite.* are skipped to prevent loops.
    • Add an allowlist so only daemon-hotfix can receive these events if configured.
    • This enables detection of failed workflow runs in plugin repos (e.g., compute.yml failures).
  3. Safety guardrails

    • Rate-limit synthetic events per repo.
    • Suppress events caused by the hotfix plugin itself to prevent loops.

Failure event payload (draft)

{
  "event": "kernel.plugin_error",
  "timestamp": "2025-01-01T00:00:00Z",
  "stateId": "<kernel state id>",
  "source": {
    "kernelRepo": "ubiquity-os/ubiquity-os-kernel",
    "environment": "prod|dev"
  },
  "trigger": {
    "githubEvent": "issue_comment.created",
    "deliveryId": "<github delivery id>",
    "repo": "owner/repo",
    "issueOrPr": 123,
    "actor": "octocat"
  },
  "plugin": {
    "type": "http|workflow",
    "id": "owner/repo:workflow.yml@ref OR https://plugin.url",
    "owner": "owner",
    "repo": "repo",
    "workflowId": "compute.yml",
    "ref": "main",
    "manifest": {
      "name": "...",
      "version": "..."
    }
  },
  "error": {
    "message": "HTTP 502: ...",
    "status": 502,
    "stack": "...",
    "category": "dispatch|http|workflow|timeout|validation",
    "raw": "..." 
  },
  "context": {
    "requestId": "...",
    "pluginInputs": {"redacted": true},
    "responseSnippet": "...",
    "retryCount": 0
  }
}

daemon-hotfix behavior (MVP)

  • Triage & De-dupe

    • Cluster by (plugin.id + error.category + error.message hash) within a time window.
    • Reopen existing issue if the same error recurs.
  • Issue creation

    • Open a GitHub issue in the plugin repo with:
      • Summary of failure + timestamp + correlation id
      • Link to triggering issue/PR
      • Sanitized error details + repro steps (event replay, if available)
      • Suggested next actions (test/trace)
  • Optional auto-fix (guarded)

    • If configured and risk tier is low, open a PR with:
      • Minimal patch
      • Regression test or replay fixture
      • Reasoning summary + risk notes
    • Never auto-merge by default.

Safety, permissions, and loops

  • Permission model options

    1. Run hotfix workflow inside the target plugin repo (simplest for auth).
    2. Central hotfix service that mints installation tokens per repo (more powerful but riskier).
  • Loop prevention

    • If a failure originates from daemon-hotfix, ignore it.
    • If a failure is already under investigation (issue open), update issue instead of spamming.
  • Redaction

    • Strip secrets/tokens from logs and payloads.
    • Limit stored raw event payloads unless opt-in.

Configuration (example)

plugins:
  ubiquity-os-marketplace/daemon-hotfix:
    with:
      enabled: true
      modes: ["issue", "pr"]
      riskPolicy:
        autoMerge: false
        maxAutofixPerDay: 3
      issueLabels: ["bug", "autofix"]
      notify: ["@team-bot"]
      allowlistRepos: ["ubiquity-os-marketplace/command-codex"]

MVP acceptance criteria

  • A 5xx from an HTTP plugin dispatch results in a single issue in that plugin repo.
  • A failed workflow run (if allowlisted) results in a single issue in that repo.
  • Duplicate failures update the existing issue instead of creating new ones.
  • No infinite loops or noisy spam during normal operations.

Open questions

  • Should the kernel emit kernel.plugin_error for both HTTP and workflow dispatch failures, or only HTTP?
  • Should the hotfix plugin support optional local reproduction with cached webhook payloads?
  • Do we want a global dashboard for error rates per plugin?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions