Changelog

All notable changes to Clawd Cursor will be documented in this file.

[0.8.3] - 2026-04-19 — Hotfix: "Outlook keeps opening" + runaway guard

User reported Outlook launching repeatedly during a test. Root-cause diagnosis traced to three compounding failures: (1) PlatformAdapter.openApp spawned a new instance even when the app was already running, (2) the escalation ladder (router → blind → hybrid → vision) re-ran open_app at each rung because earlier rungs couldn't verify success through New Outlook's sparse WebView2 accessibility tree, (3) clawdcursor stop only killed the start process on port 3847, missing serve (different port / same port different process) and mcp (stdio, no port) entirely. A stale serve kept receiving MCP traffic after the user thought they'd stopped everything.

Fixed

openApp / launchApp idempotency (Windows + macOS + Linux). When the target app already has a visible window AND the caller didn't set alwaysNewInstance: true AND no url is passed, the adapter now focuses the existing window and returns its pid instead of spawning another instance. Match policy: case-insensitive exact processName → processName substring → title substring → UWP AppId tail. Closes the "N windows of Outlook stacking up" class of bug under any retry loop. src/v2/platform/{windows,macos,linux}.ts.
Agent runaway guard — if the agent calls the same tool + identical args ≥ 3 times within the last 6 turns, the loop exits with give_up and a targeted message suggesting detect_webview_apps when the target is likely Electron/WebView2. Prevents the generalized "retry-loop-because-a11y-is-opaque" anti-pattern. src/pipeline/agent/agent.ts.
clawdcursor stop now sweeps all modes. After the graceful /stop on port 3847, iterates every pidfile in ~/.clawdcursor/*.pid, SIGTERMs any live pid, SIGKILLs after 500ms if still running, and unlinks the pidfile. Catches mcp (stdio-only), zombie serve, and any start/serve on a non-default port. src/index.ts.

Notes

Stale-pidfile cleanup at startup was already correct via claimPidFile (checks isProcessAlive(existingPid) and overwrites when dead) — no code change needed there; the issue was exclusively stop.
Tests: 429 / 430 pass (1 skipped, same as 0.8.2). No schema snapshot change — these are behavioral fixes, not catalog changes.

[0.8.2] - 2026-04-19 — Session reliability, force-focus, Electron bridge

First-time-user review surfaced six concrete pain points. This release fixes every one.

Fixed

Silent 401 mid-session (the session-killer). Previous versions compared the incoming Bearer token against an in-memory SERVER_TOKEN only. A second clawdcursor process (stale pidfile takeover, or a concurrent mode) rewrote the token FILE without updating the first server's in-memory copy — clients reading the file silently lost auth. /health kept returning 200 so the failure was invisible. Fix: requireAuth now accepts EITHER the in-memory token OR the current on-disk token (mtime-cached, ~free). Drift is logged once with a recovery hint. src/server.ts.
focus_window force-to-front on Windows. Previous implementation called SetForegroundWindow which the OS blocks when the caller isn't the current foreground process. New implementation uses the full sequence: ShowWindow(SW_RESTORE) → topmost-toggle → AttachThreadInput with the current foreground thread → AllowSetForegroundWindow(ASFW_ANY) → BringWindowToTop → SetForegroundWindow, with an Alt-key synthetic fallback. Raises any window through Windows' foreground lock. scripts/ps-bridge.ps1.
Richer validation errors. REST /execute rejections now carry the full expected tool signature. A missing param returns Missing required parameter "target". Expected smart_click(target: string, processId?: number). — agents no longer have to roundtrip to /docs. src/tool-server.ts.

Added

Electron / WebView2 detection. New MCP tools detect_webview_apps and relaunch_with_cdp (also exposed via compact system({"action":"detect_webview"}) / system({"action":"relaunch_with_cdp"})). Recognises olk (New Outlook), Teams, Discord, Slack, VS Code, GitHub Desktop, Notion, Obsidian, Spotify. When detected, probes ports 9222/9223/9229/8315 for a live CDP endpoint; if found, tells the agent to attach via browser({"action":"connect"}). If not, shows the exact relaunch command (e.g. discord --remote-debugging-port=9222) so CDP can be enabled and the sparse UIA tree bypassed entirely. src/tools/electron_bridge.ts.
drag_path documentation clarity. Existing mouse_drag_stepped / compact computer({"action":"drag_path","path":"[...]"}) now explicitly documented for freehand curve drawing (Paint, Figma, canvas apps). SKILL.md "Quick reference" covers when to use drag_path vs drag.

Changed

SKILL.md pushes compact mode harder. Top of doc now carries a directive callout: "If you are an LLM reading this: YOU SHOULD BE USING COMPACT MODE." with MCP config + REST URL. Granular stays available but is explicitly labeled the power-user / larger-prompt option.
SKILL.md web-app keyboard warning. Web-wrapped apps (Outlook, Teams, Gmail) treat Escape as "close dialog/modal" — sometimes closing the compose window. Documented: do not use Escape to dismiss autocompletes in web apps; use arrow keys + Enter or click-away.
Error-recovery table expanded with Electron-vs-true-canvas split, v0.8.2 auth recovery, v0.8.2 force-focus note, and the drag_path vs drag distinction.

Tests

429 / 430 passing (one skipped, same as 0.8.0).
Schema snapshot regenerated → 74 granular tools (72 + 2 Electron bridge).
Live smoke: token auth survives a second clawdcursor serve; focus_window raises Paint through a full-screen window; detect_webview_apps correctly flags Outlook / Teams / VS Code when any are open.

Consolidates v0.8.1 (never tagged)

0.8.1-alpha.0 through -alpha.N shipped unified-pipeline + compact-MCP + Linux AT-SPI + Wayland routing on the feature branch. They roll into 0.8.2 as a single stable release. See the v0.8.1-alpha tag range in the git history for per-tranche detail; headline features:

Unified blind/hybrid/vision agent — one loop, three modes. Replaces the v0.8.0 split text-agent + vision-agent with a single harness using native tool_use (Anthropic) / tool_calls (OpenAI) / prose-JSON fallback.
Compact MCP surface — 6 compound tools (computer, accessibility, window, system, browser, task) that collapse the full capability into ~1,500 tokens of catalog. Anthropic-Computer-Use shape extended across the whole product. clawdcursor mcp --compact or GET /tools?mode=compact.
PlatformAdapter widened — mouseDown/Up, keyDown/Up, setWindowState, setWindowBounds, listDisplays, waitForElement, widened InvokeAction (expand/collapse/toggle/select/get-value), richer UiElement state flags.
Linux AT-SPI bridge — read-only first pass via python3-gi + gir1.2-atspi-2.0. Linux a11y methods (getUiTree, findElements, getFocusedElement, waitForElement) now return real data on boxes where the bridge dependencies are present. invokeElement still stubbed — tracked for a follow-up pass.
Linux Wayland input routing — ydotool (mouse + keyboard) or wtype (keyboard fallback) detected at init. X11 path unchanged; Wayland no longer silently mis-fires through nut-js.
Per-capability palettes + compound vision tools — text-agent turns now see a 6-10 tool scoped palette based on the subtask's capability (app_launch / text_input / navigation / form_fill / spatial / file_ops / window_mgmt / general). Vision-agent turns see 3 compound mouse / keyboard / window tools with action enums. ~12× fewer catalog tokens per turn.
Pretty TTY logs with HH:MM:SS timestamps — layer-tagged ([router], [blind], [vision], [safety], etc.), no per-line repetition, CLAWD_LOG=pretty default on TTY.
SKILL.md rewrite — reviewed by a Sonnet subagent against legacy v0.6.3/v0.7.14 tone, verified model-agnostic + OS-agnostic, restored "USE AS A FALLBACK" + "IMPORTANT — READ THIS BEFORE ANYTHING ELSE" directive callouts and Sensitive App Policy.

[0.8.0] - 2026-04-16 — V2 Architecture (opt-in)

A ground-up reimagining of the internal pipeline. Opt in with clawdcursor start --v2. The legacy pipeline is unchanged and remains the default.

Added

--v2 flag on clawdcursor start — activates the new 3-layer architecture: Router → VisionAgent → Verifier. No effect on MCP, serve, or legacy start.
src/v2/platform/ — platform abstraction. Single PlatformAdapter interface with macos.ts, windows.ts, linux.ts implementations. Replaces 142+ scattered if (process.platform === 'darwin') branches across 34 files. Business logic no longer sees process.platform. Adding a new OS = one file.
src/v2/verifier/ — GroundTruthVerifier. Six independent signals decide whether a task actually completed: pixel diff, window change, focus change, OCR delta, task-specific assertions (send_email, navigate_url, open_app, type_text, search, compose_message, create_file), and anti-patterns (error dialogs, "cannot send", "draft saved", invalid recipient, auth failed). Weighted voting with hard-fail rules on anti-patterns. Cannot be fooled by LLM self-reported "done".
src/v2/agent/ — VisionAgent: a single vision-first tool-use loop. 16 tools (screenshot, read_screen, list_windows, click, drag, scroll, type, key, invoke_element, set_field_value, open_app, focus_window, read_clipboard, write_clipboard, wait, done). 6-rule system prompt (down from 36). Model-agnostic via existing callVisionLLM.
src/v2/orchestrator.ts — PipelineV2 wires Router → VisionAgent → Verifier with before/after state capture.
Hardened JSON parser — tolerates trailing braces, markdown code fences, and other common LLM malformations. Balanced-brace extraction as fallback.

Fixed

False positives — legacy pipeline reports UNVERIFIED_SUCCESS when the agent claims "done" but the screen didn't change. V2 verifier catches this class: in a live email-send test the agent said "Email sent" but a "Cannot send" dialog was on screen. V2 correctly rejected the claim. (Legacy still does what it does; this fix only applies when --v2 is set.)

Testing

Smoke-tested on macOS with Anthropic Claude Haiku (text) + Sonnet (vision):

Task	Time	Verdict
Open TextEdit and type	30s	✅ (4/6 signals)
Calculator: 47+53=100	65s	✅ (5/6 signals, zero parse errors)
Safari → github.com	45s	✅ (6/6 signals)
Notes: create note	182s	✅ (6/6 signals)
Email send (failing server)	86s	❌ Correctly rejected — legacy would have reported success

Platform Safety

No legacy code modified. Windows, Linux, and MCP paths untouched. v2 code is entirely under src/v2/.

[0.7.14] - 2026-04-13 — Full macOS Keyboard Automation + Platform-Aware Pipeline

Fixed

macOS keystrokes silently dropped — root cause: CGEvent.post() from the Swift helper is blocked by macOS TCC when the helper is spawned as a child of Node.js. keyPress() and typeText() on macOS now route through osascript + System Events (the Apple-sanctioned method). All keyboard shortcuts (Cmd+V, Cmd+N, Shift+Cmd+D, etc.) now work correctly.
Single-char keys losing modifiers — keycodeForCharacter() lookup added to ClawdCursorHelper; modifiers are no longer discarded for Cmd+letter combos.
asDouble() coercion — click/drag coordinates sent as integers (common from some LLMs) no longer fail with a type mismatch in the Swift helper.
keycodeForCharacter fallback — now returns an error for unmapped characters instead of silently falling back to the 'v' keycode.
Permission check inconsistency — doctor, status, and readiness.ts all now query the same canonical path: Host /status → permission-check binary → direct fallback. No more false "granted" reports.
Screenshot capture CPU spin — replaced CGWindowListCreateImage (triggers ReplayKit CPU spin bug on macOS 14+) with a delegated screenshot-helper subprocess.
A11y false positive — isShellAvailable() now tests actual window access (p.windows.length) instead of processes.length, which worked without Accessibility permission.
Node.js v25 crash — EINVAL/setTypeOfService socket error from undici's internal QoS call is now caught and suppressed (non-fatal).
Dock click zone — reduced from 60px to 30px on macOS (Dock is thinner than the Windows taskbar).
Browser URL bar shortcut — Cmd+L used on macOS (was Ctrl+L, which does nothing in macOS browsers).

Added

macMailEmailFlow — deterministic email flow for macOS Mail.app (Cmd+N, Tab to subject/body, Cmd+Shift+D to send).
clawdcursor grant command — triggers macOS system permission dialogs directly from the CLI.
115 Apple shortcuts — Mail, Safari, Notes, Messages, Terminal added to the shortcut database.
scripts/test-macos-fixes.sh — one-shot E2E verification script: rebuild, binary check, permission consistency, screenshot capture, doctor cross-check.
--request-screen-recording flag on permission-check binary — optional TCC dialog trigger for Screen Recording.
processPath + bundleId in all permission check responses — aids TCC debugging.
30s TTL cache on A11y shell availability — permission grants mid-session are now detected without restart.
macOS native binary verification in scripts/verify-install.js — warns on missing binaries at npm install time.
setup script auto-builds native binaries on macOS (inside npm run setup).

Changed

build.sh — marked executable in git, fails fast on missing binaries (was silently warning), better error guidance.
Installer — verifies all 4 required binaries (not just ClawdCursorHost), uses bash ./build.sh for portability.
doctor.ts — permission check unified via native-helper module; triggers system permission dialogs if denied.
Email flow keyboard shortcuts — platform-aware: Ctrl+Enter → Shift+Cmd+D on macOS, Ctrl+H → Cmd+Option+F for Find & Replace.
sharp bumped ^0.33.0 → ^0.33.5.

Platform Safety

No Windows or Linux code paths affected. All macOS changes are gated behind IS_MAC / process.platform === 'darwin' / isMacOS().

[0.7.13] - 2026-04-10 — Unified Permission Checks + Screenshot Helper

Fixed

Permission check fragmentation — doctor, status, and readiness each used different permission APIs, producing contradictory results. All now route through ClawdCursorHost /status → permission-check binary → direct AXIsProcessTrusted fallback.
Screenshot CPU spin — delegated takeScreenshot() to screenshot-helper subprocess, eliminating the ReplayKit CPU spike on macOS 14+.
Installer binary verification — now checks all 4 required binaries (ClawdCursorHost, clawdcursor-helper, screenshot-helper, permission-check) instead of just ClawdCursorHost.
build.sh silent failures — swift build errors now fail the build immediately with actionable guidance.

Added

clawdcursor grant command — triggers macOS system permission dialogs for Accessibility and Screen Recording.
processPath + bundleId in permission check responses for TCC debugging.
--request-screen-recording flag on permission-check binary.

[0.7.12] - 2026-04-09 — Comprehensive macOS TCC Fix

Fixed

Bash pipeline bug — set -o pipefail added; build failures now properly detected (was silently passing due to pipeline exit status bug)
Ad-hoc signing by default — build.sh now always signs the app (required for TCC on macOS 26+ Tahoe where unsigned binaries don't appear in privacy settings)
Build error capture — uses temp file instead of pipe to properly capture exit status
TCC permission check — runs permission-check after build to show current accessibility/screen recording status

Changed

build.sh rewritten — cleaner structure, ad-hoc signing is default (not optional), signature verification added
Codesign uses --deep — ensures all nested binaries are signed
Installer shows TCC status — tells user exactly which permissions need to be granted and where

Technical Details

The core issue was TCC (Transparency, Consent, and Control) on macOS binds permissions to the code signing identity. Without signing:

On macOS 26+ (Tahoe), unsigned binaries don't appear in System Settings privacy panels at all
Users saw "ClawdCursorHost binary not found" errors even though install appeared to succeed

Reference: mediar-ai/mcp-server-macos-use for TCC permission handling patterns.

[0.7.11] - 2026-04-09 — macOS Installer Fix

Fixed

macOS installer now fails loudly if native host build fails — was silently swallowing build errors and claiming "optional fallback" that doesn't exist
Added verification step — installer explicitly checks ClawdCursorHost binary exists before declaring success
Show build output — Swift build errors are now visible instead of redirected to /dev/null
Clear error messages — tells users exactly what went wrong and how to fix it (xcode-select --install, manual rebuild, etc.)

Changed

macOS native host is now correctly marked as REQUIRED, not optional
Installer exits with error code 1 if native build fails on macOS

[0.7.10] - 2026-04-08 — Guided Setup Flow

Changed

Installer shows next steps — after install, displays clear guidance: clawdcursor doctor → clawdcursor start
Doctor shows run options — after passing all checks, shows both start (full agent) and serve (tools-only) modes
Consent shows next step — after granting consent, directs users to clawdcursor doctor

[0.7.9] - 2026-04-08 — UX Improvements

Changed

macOS permission messages — now direct users to enable "ClawdCursor" instead of "Terminal/Node"
Screen Recording path — updated to "Screen & System Audio Recording" (macOS Sequoia naming)

[0.7.8] - 2026-04-08 — Documentation Fix

Fixed

Installer comments updated — example version references now point to v0.7.8

[0.7.7] - 2026-04-08 — Installer Fixes

Fixed

Installers default to main branch — install.sh and install.ps1 now use main instead of hardcoded non-existent tag
macOS installer builds native helper — install.sh now runs ./native/build.sh on Darwin if Swift is available
Version override support — VERSION=v0.7.7 curl ... | bash or $env:VERSION='v0.7.7' to install specific release
Auto-pull on update — installers now run git pull after checkout to get latest changes

[0.7.6] - 2026-04-08 — macOS Native Host App

Added

macOS Host App (ClawdCursorHost) — new native Swift executable that runs as the app bundle's main process, owning all TCC permissions (Accessibility, Screen Recording) under a single app identity
Localhost IPC server — host app exposes GET /health, GET /status, POST /rpc on 127.0.0.1:3848 for CLI→host communication
Token-based authentication — ~/.clawdcursor/host-token (mode 0600) secures the IPC channel
Auto-launch/stop — clawdcursor start ensures host is running; clawdcursor stop gracefully quits it
New Swift helper methods — moveMouse, dragMouse, captureScreen for smoother native macOS automation
Menu bar presence — host app shows 🐾 icon in menu bar for visibility

Security

Localhost-only binding — IPC server uses NWParameters.requiredLocalEndpoint to bind to 127.0.0.1 only, rejecting connections from other machines
Token file permissions — host-token created with mode 0600 (owner read/write only)

Changed

src/native-helper.ts — routes all macOS desktop operations through host IPC instead of direct stdio
src/native-desktop.ts — 11 platform-guarded code paths delegate to host on macOS
src/index.ts — start/stop commands manage host app lifecycle
native/ClawdCursor.app/Contents/Info.plist — bundle identifier changed to com.clawdcursor.app, executable to ClawdCursorHost

Unchanged

Windows/Linux — all macOS code behind IS_MAC && this.helper guards; no behavior changes on other platforms
172 tests pass — full test suite unchanged

[0.6.3] - 2026-03-01 — Universal Pipeline, Multi-App Workflows, Provider-Agnostic

Added

LLM-based universal task pre-processor — one cheap text LLM call decomposes any natural language into {app, navigate, task, contextHints}, replacing brittle regex parsing
Multi-app workflow support — copy/paste between apps (e.g. Wikipedia → Notepad) with 6-checkpoint tracking: first_app_focused → first_app_action_done → content_copied → second_app_opened → content_pasted → result_visible
Site-specific keyboard shortcuts — Reddit (j/k/a/c), Twitter/X (j/k/l/t/r), YouTube (Space/f/m), Gmail (j/k/e/r/c), GitHub (s/t/l), Slack (Ctrl+k), plus generic hints
OS-level default browser detection — reads Windows registry (HKCU ProgId) or macOS LaunchServices instead of hardcoded Edge/Safari
3 verification retries with step log analysis — when verification fails, builds a digest of recent actions + checkpoint status so the vision LLM can fix the specific missed step
Mixed-provider pipeline support — e.g. kimi for text, anthropic for Computer Use, with per-layer API key resolution from OpenClaw auth-profiles
ComputerUseOverrides interface — apiKey, model, baseUrl per-layer for mixed-provider setups
resolveProviderApiKey() helper — reads OpenClaw auth-profiles to find the right API key per provider

Fixed

Checkpoint system overhaul — removed auto-termination (completionRatio ≥ 0.90 early exit and isComplete() mid-loop kill), strict detection: content_pasted requires Ctrl+V, content_copied requires Ctrl+C, second_app_opened detects any window switch universally
Pipeline context passing — priorContext[] accumulator flows from pre-processing through to Computer Use (no more amnesia between layers)
Credential resolution order — .clawdcursor-config → auth-profiles.json → openclaw.json (with template expansion) → env vars
loadPipelineConfig() path resolution — checks package dir first, then cwd (fixes global npm installs)
Smart Interaction model lookup — uses PROVIDERS registry instead of hardcoded model/baseUrl maps; fixes stale claude-haiku-3-5-20241022 fallback
Scroll behavior — system prompts instruct PageDown/Space instead of tiny mouse scrolls; default scroll delta 3 → 15
Provider-agnostic internals — all comments and logs say "vision LLM" instead of "Claude"
Verification retry limit — max 3 retries prevents infinite verification loops
Universal checkpoint detection — no hardcoded app lists; detectTaskType() uses action patterns only

Changed

Pipeline architecture: LLM Pre-processor → Pre-open app + navigate → L0 Browser → L1 Action Router + Shortcuts → L1.5 Smart Interaction → L2 A11y Reasoner → L3 Computer Use
Pre-processor prompt hardened with NEVER rules (never summarize, never drop steps) and VALIDATION RULE
MULTI-APP WORKFLOWS section added to both Mac and Windows Computer Use system prompts
Checkpoint thresholds tightened: early completion 75% → 90%, skip-verification 50% → 80%

[0.6.5] - 2026-02-28 — Checkpoint System, Task Completion Detection

Added

Checkpoint-based task completion — Computer Use tracks milestones (compose opened → fields filled → send pressed → compose closed) and stops when all checkpoints are met. No more wasted calls after successful completion.
Task type detection — auto-classifies tasks (email, form, navigate, draw, file_save) and applies appropriate checkpoint templates.
Smart early termination — when Claude says "done" and ≥75% checkpoints confirmed, accepts completion immediately.
Auto-config on first run — clawdcursor start auto-detects providers without needing clawdcursor doctor.
Universal provider support — any OpenAI-compatible endpoint works via --base-url.
CLI model selection — --text-model and --vision-model flags.

Fixed

Email domain extraction bug — "send to user@hotmail.com" no longer navigates to hotmail.com. Email addresses are stripped before URL matching.
Verification override bug — verification no longer contradicts confirmed checkpoint completion. Skipped when ≥50% checkpoints met.
Context loss between layers — Computer Use now receives full context of what pre-processing already did.
Drawing quality — minimum 50px drag distances enforced via system prompt.
OpenClaw credential discovery — multi-provider scan, template variable resolution, no false overrides.
Pipeline gate — Action Router always runs, shortcuts work everywhere.

Changed

Pipeline pre-processes "open X and Y" tasks — opens app via Action Router (free), then hands remaining task to deeper layers.
Smart Interaction detects visual loop tasks (draw, paint) and skips to Computer Use.
Computer Use system prompt includes Snap Assist handling and drawing guidelines.

[0.6.2] - 2026-02-28 — Universal Provider Support, Auto-Config

Added

Auto-config on first run — clawdcursor start auto-detects and configures providers without needing clawdcursor doctor first. Doctor is now optional for fine-tuning.
Universal provider support — any OpenAI-compatible endpoint works. Not limited to 7 hardcoded providers. Use --base-url + --api-key for custom endpoints.
CLI model selection — --text-model and --vision-model flags on start command.
Dynamic OpenClaw provider mapping — reads ALL providers from OpenClaw config, not just known ones. NVIDIA, Fireworks, Mistral, etc. work automatically.

Changed

clawdcursor start now auto-runs setup if no config exists (non-interactive)
Provider detection accepts any provider name, falling back to OpenAI-compatible API
detectProvider() returns 'generic' for unknown providers instead of defaulting to 'openai'

[0.6.1] - 2026-02-28 — Keyboard Shortcuts, Pipeline Fixes

Added

Keyboard shortcuts registry (src/shortcuts.ts) — 30+ common actions mapped to direct keystrokes. Scroll, copy, paste, undo, reddit upvote/downvote, browser shortcuts, and more. Zero LLM calls.
Fuzzy shortcut matching — "scroll the page down" fuzzy-matches to scroll-down shortcut. Context-aware matching for social media actions.
Router telemetry — Action Router now logs match type, confidence, and shortcut hits.
CDP→UIDriver fallback — Smart Interaction falls back to accessibility tree automation when browser CDP path fails.
Gmail, Outlook, Hotmail added to Browser Layer site map.

Fixed

Pipeline gate bug — Action Router was gated behind !isBrowserTask, causing shortcuts to be skipped for browser-context tasks (e.g., "reddit upvote" matched browser regex but should use shortcut). Action Router now always runs after Browser Layer.
URL extraction false positives — "open gmail and send email to foo@bar.com" no longer extracts bar.com. URL extraction now isolates the navigation clause before matching.
Reliable force-stop — clawdcursor stop now force-kills lingering processes via PID file.
Provider label inference — startup logs now clearly show text and vision provider names separately.

Changed

Pipeline order: Browser Layer (L0) → Action Router + Shortcuts (L1) → Smart Interaction (L1.5) → A11y Reasoner (L2) → Vision (L3). Action Router no longer gated.
extractUrl() uses navigation clause isolation instead of matching against full task text.

[0.6.0] - 2026-02-28 — Universal Provider Support, OpenClaw Integration

Added

OpenClaw credential integration — auto-discovers all configured providers from OpenClaw's auth-profiles.json and openclaw.json. No separate API key needed when running as an OpenClaw skill.
Universal provider support — added Groq, Together AI, DeepSeek as first-class providers with profiles, env var detection, and key prefix recognition.
Auto-detection as default — provider defaults to auto instead of hardcoding Anthropic. Doctor picks the best available provider automatically.
Mixed provider pipelines — use Ollama for text (free) + any cloud provider for vision (best quality). Vision credentials preserved when brain reconfigures for text.
Dynamic Ollama model selection — doctor picks the best available Ollama model instead of hardcoding qwen2.5:7b.
Anthropic vision routing fix — detects Anthropic vision by key prefix (sk-ant-) independently of the main provider field, so split-provider setups work correctly.

Changed

Default config no longer assumes any specific provider or model
Provider scan loop iterates all registered providers dynamically
Help text and doctor output are provider-agnostic
--provider CLI flag accepts any string (not limited to 4 providers)
README updated with 7-provider compatibility table

Security

SKILL.md hardened — removed aggressive autonomy language ("use without asking", "be independent")
Sensitive App Policy — agents must ask the user before accessing email, banking, messaging, or password managers
Safety tiers as hard rules — 🔴 Confirm actions must never be self-approved by agents
Data flow transparency — expanded security section documents network isolation, per-provider data flow, and Ollama = fully offline
No credentials in skill directory — OpenClaw users get auto-discovery from local config; no keys stored in skill files

Fixed

Vision model crash when main provider set to Ollama but vision uses Anthropic (model not found error)
Brain reconfiguration was wiping vision credentials — now preserved

[0.5.6] - 2026-02-27 — Fluid Decomposition, Interactive Doctor, Smart Vision Fallback

Added

Fluid LLM task decomposition — decompose prompt now tells the LLM to reason about what ANY app needs. No more hardcoded examples. "Write me a sentence about dogs" generates actual content instead of typing the literal instruction.
Interactive doctor onboarding — after scanning providers, doctor shows all working TEXT and VISION LLM options with ★ recommendations. User picks by number, Enter for default. Shows GPU info (VRAM via nvidia-smi) to help decide local vs cloud.
Cloud provider guidance — doctor shows unconfigured providers with signup URLs and lets you paste an API key inline (auto-detects provider, saves to .env).
Smart vision fallback for compound tasks — when Router or Reasoner handles part of a multi-step task but fails midway, ALL remaining subtasks are bundled and handed to Computer Use (vision). Prevents false-success trapping in cheap layers.
Ollama auto-detection — brain auto-reconfigures to use local Ollama for decomposition when no cloud API key is set. hasApiKey now recognizes local LLMs.
Compound task guard — action router detects multi-step/compound tasks (commas, "then", "and then") and skips to deeper layers.

Fixed

Case-preserving action router — all regex matches against raw (unmodified) task text. Typed text and URLs no longer get lowercased.
Flexible click matching — click Blank document works without quotes (was requiring click "Blank document"). Single unified regex for quoted and unquoted element names.
PowerShell encoding — replaced emoji (🐾) and em dash (—) in task console title that broke on Windows PowerShell due to encoding.
Stale config — .clawdcursor-config.json now correctly reflects Ollama when doctor detects it (was stuck on Anthropic).
Brain provider mismatch — decomposition no longer calls Anthropic API when only Ollama is available.

Changed

npm run setup — new script that builds and registers clawdcursor as a global command via npm link. Works on Windows, macOS, and Linux.
Stop/kill port validation — port input is now sanitized (parseInt + range check 1-65535) to prevent command injection
Kill health verification — kill command now verifies /health returns a Clawd Cursor response before force-killing
Install instructions updated — README and docs now use npm run setup

Test Results

Task	Pipeline Path	Steps	LLM Calls	Time	Result
Open Notepad	Action Router	1	0	1.5s	✅
Open Notepad + write haiku	Router → Smart Interaction → Computer Use	6	7	58.8s	✅ Verified
Open Google Doc in Edge + write sentence	Browser → Computer Use	17	9	78.8s	✅ Verified

[0.5.5] - 2026-02-26 — Install/Uninstall, OpenClaw Auto-Registration, Doctor UX

Added

clawdcursor install — one command to set up API key, configure pipeline, and register as OpenClaw skill
clawdcursor uninstall — clean removal of all config, data, and OpenClaw skill registration
Doctor auto-registers as OpenClaw skill — symlinks into ~/.openclaw/workspace/skills/clawdcursor
Doctor quick fix commands — shows exact commands for missing text LLM and vision LLM in summary
Dashboard favorites — star commands to save them, click to re-run, persists across server restarts
Credential detection — warns when starring tasks that contain API keys or passwords
OS tabs on website — Windows/macOS/Linux with auto-detect
Post-build help message — shows all available commands after npm run build
Dynamic OS detection — system prompt uses actual OS instead of hardcoded "Windows 11" (thanks @molty)

Fixed

Windows skill detection — removed requires.bins from SKILL.md; OpenClaw's hasBinary() doesn't handle Windows PATHEXT (.exe/.cmd), causing the skill to show as "missing" even when node is installed

Changed

SKILL.md rewritten — agent identity shift framing, trigger lists, CDP direct path, async polling, error recovery
Security hardened — agents cannot self-approve confirm-tier actions, autonomous use scoped to read-only
Privacy language clarified — explicit per-provider data flow
Website Get Started simplified — 3 lines, commands shown in terminal post-build
Anthropic text model updated — claude-haiku-4-5 (was claude-3-5-haiku-20241022)

[0.5.4] - 2026-02-25 — SKILL.md Rewrite + Security Hardening

Changed

Privacy language clarified — explicit per-provider data flow (Ollama = fully local, cloud = data to that API only)
Added homepage and source URLs to skill metadata
Removed hard-coded paths from SKILL.md
Security section expanded — includes localhost bind verification command
Security scan addressed — all flagged documentation gaps resolved

[0.5.3] - 2026-02-25 — SKILL.md Rewrite for Agent Autonomy

Changed

SKILL.md rewritten — agents now understand they have full desktop control and stop asking users to do things they can do themselves
Agent identity shift framing — blockquote at top overrides default "I can't do desktop things" behavior
"When to Use This" trigger list — comprehensive decision framework for when to reach for Clawd Cursor
Two paths documented — REST API (port 3847) for full desktop control, CDP Direct (port 9222) for fast browser reads
Async flow clarified — concrete polling pattern agents can follow step-by-step
Error recovery table — 8 common problems with exact solutions
Expanded task examples — cross-app workflows, data extraction, verification scenarios
README — added OpenClaw Integration section

[0.5.2] - 2026-02-25 — Web Dashboard + Browser Foreground Focus

Added

Web Dashboard — full single-page UI served at GET / (port 3847). Task submission, real-time logs, status indicators, approve/reject for safety confirmations, kill switch. Dark theme, fully responsive, zero external dependencies.
clawdcursor dashboard — CLI command to open the dashboard in your default browser
clawdcursor kill — CLI command to send a stop signal to the running server
GET /logs — API endpoint returning last 200 log entries with timestamps and levels
Browser foreground focus — Playwright navigation now brings Chrome to the front via page.bringToFront() + OS-level window activation (PowerShell SetForegroundWindow on Windows, osascript on macOS). The AI acts like a visible cursor — you see everything it does.
Console hook — hookConsole() intercepts all server logs for the dashboard log feed with auto-classification (error/success/warn/info)

Changed

Smart task handoff — Browser layer no longer uses regex word lists to detect multi-step tasks. Pure navigation ("open youtube") completes in browser layer; anything more complex falls through to SmartInteraction where the LLM plans the steps. No more missed verbs.

Architecture

Layer 0: Browser (Playwright) — navigate + foreground focus
    ↓ more than navigation? → fall through
Layer 1: Action Router — regex patterns, zero LLM calls
    ↓ no match? → fall through
Layer 1.5: Smart Interaction — 1 LLM call plans steps, CDP/UIDriver executes
    ↓ failed? → fall through
Layer 2: Accessibility Reasoner — reads UI tree, cheap LLM
    ↓ failed? → fall through
Layer 3: Screenshot + Vision — full screenshot, Computer Use API

[0.5.1] - 2026-02-23 — HD Screenshots + Focus Stability

Fixed

HD screenshots — LLM resolution increased from 1024px to 1280px (scale 2x instead of 2.5x). Claude can now reliably identify toolbar icons, buttons, and small UI elements.
JPEG quality — bumped from 55 to 65 for clearer icon identification
Window focus stability — Win+D minimizes all windows before task execution, preventing the Clawd terminal from stealing focus from target apps
Paint drawing reliability — pencil tool guidance in system prompt, mandatory checkpoint after tool selection
Stale file cleanup — restored get-windows.ps1 shim (still referenced by accessibility.ts), removed dead setup.ps1 and get-ui-tree.ps1

Performance (Paint stickman benchmark)

Metric	v0.5.0	v0.5.1
Time	~250s	55s
API calls	30	6
Success rate	~50%	~90%

[0.5.0] - 2026-02-23 — Smart Pipeline + Doctor + Batch Execution

Added

clawdcursor doctor — auto-diagnoses setup, tests models, configures optimal pipeline
3-layer pipeline — Action Router → Accessibility Reasoner → Screenshot fallback
Layer 2: Accessibility Reasoner (src/a11y-reasoner.ts) — text-only LLM reads the UI tree, no screenshots needed. Uses cheap models (Haiku, Qwen, GPT-4o-mini).
Batch action execution — Claude returns multiple actions per response (3.6 avg), skipping screenshots between batched actions. Drawing tasks execute 10+ actions in a single API call.
Focus hints — each screenshot includes a FOCUS directive telling Claude where to look, reducing output tokens and decision time
Auto-maximize — apps launched via Action Router are automatically maximized (Win+Up) for consistent layout
Region capture — captureRegionForLLM() crops screenshots to specific areas (2-30KB vs 58KB full)
Checkpoint strategy — screenshots only after critical state changes (app open, dialog appear), not after every action
Multi-provider support — Anthropic, OpenAI, Ollama (local/free), Kimi. Same codebase, auto-detected.
Provider model map (src/providers.ts) — auto-selects cheap/expensive models per provider
Self-healing — doctor falls back if a model is unavailable (e.g., Haiku → Qwen). Circuit breaker disables failing layers at runtime.
Streaming LLM responses — early JSON return saves 1-3s per call
Combined accessibility script (scripts/get-screen-context.ps1) — 1 PowerShell spawn instead of 3
Benchmark harness (test-perf-comparison.ts)

Performance

Screenshots: 120KB → ~80KB, 1280px target (HD for reliable icon identification)
JPEG quality: 70 → 65
Delays: 200-1500ms → 50-600ms across the board
System prompts: ~60% smaller (fewer tokens per call)
Accessibility tree: filtered to interactive elements only, 3000 char cap
Taskbar cache: 30s TTL (was queried every call)
Screen context cache: 500ms → 2s TTL

Benchmarks

Task	v0.4	v0.5 (Ollama, $0)	v0.5 (Anthropic)	v0.5 + Batch
Calculator	43s	2.6s	20.1s	—
Notepad	73s	2.0s	54.2s	—
File Explorer	53s	1.9s	22.1s	—
Paint stickman	~250s (30 calls)	—	~124s (19 calls)	101s (11 calls)
GitHub profile	—	—	~106s (15 calls)	—

[0.4.0] - 2026-02-22 — Native Desktop Control

VNC removed. Clawd Cursor now controls the desktop natively via @nut-tree-fork/nut-js. No VNC server required.

Breaking Changes

--vnc-host, --vnc-port, --vnc-password CLI flags removed
VNC_PASSWORD, VNC_HOST, VNC_PORT environment variables no longer used
rfb2 dependency removed
setup.ps1 no longer installs TightVNC

Added

NativeDesktop class (src/native-desktop.ts) — drop-in replacement for VNCClient
Direct screen capture via @nut-tree-fork/nut-js (~50ms vs ~850ms)
Direct mouse/keyboard control via OS-level APIs
Simplified onboarding: npm install && npm start

Performance

Screenshots: ~850ms → ~50ms (17× faster)
Connect time: ~200ms → ~38ms (5× faster)
Simple task (Google Docs sentence): ~120s → ~102s
Complex task (GitHub → Notepad → save): ~200s → ~156s

Removed

VNC server dependency (TightVNC)
rfb2 npm package
VNC-related CLI flags and environment variables
BGRA→RGBA color swap (nut-js returns RGBA natively)

[0.3.3] - 2025-03-15

Bulletproof Headless Setup

setup.ps1 now completes end-to-end in a single run on fresh systems, even in non-interactive/headless AI agent shells
Generate random VNC password when --vnc-password not provided non-interactively
Replace Start-Process -NoNewWindow -Wait with -PassThru -WindowStyle Hidden + try/catch (msiexec crash fix)
Wrap Start-Service in its own try/catch (post-install crash fix)
Replace all emoji with ASCII tags for cp1252 headless terminal compatibility

[0.3.1] - 2025-03-10

SKILL.md Security Hardening

Added YAML frontmatter, explicit credential declarations, privacy disclosure, and security considerations for ClaWHub publishing.

[0.3.0] - 2025-03-01

Performance Optimizations (~70% faster)

Screenshot hash cache — skips LLM calls when the screen hasn't changed
Adaptive VNC frame wait — captures in ~200ms instead of fixed 800ms
Parallel screenshot + accessibility fetch — runs concurrently via Promise.all
Accessibility context cache — 500ms TTL eliminates redundant PowerShell queries
Async debug writes — no longer blocks the event loop
Exponential backoff with jitter — better retry resilience for API calls

[0.2.0] - 2025-02-21

🚀 Major: Anthropic Computer Use API

Clawd Cursor now supports Anthropic's native Computer Use API (computer_20250124) as the primary execution path. This is a fundamentally different approach — the full task goes directly to Claude with native computer use tools. No decomposition, no routing. Claude sees screenshots, plans, and executes natively.

Dual Execution Paths

The agent now has two separate code paths selected by provider:

Path A — Computer Use API (--provider anthropic): Full task sent to Claude with computer_20250124 tool. Claude sees the screen, plans multi-step sequences, and executes them natively. Handles complex, multi-app workflows reliably.
Path B — Decompose + Action Router (--provider openai / offline): Original approach from v0.1.0. Parse task → subtasks → Action Router (UI Automation, zero LLM) → Vision fallback. Faster and cheaper for simple tasks, works without an API key.

Added

Anthropic Computer Use integration — native computer_20250124 tool type with anthropic-beta: computer-use-2025-01-24 header
Adaptive delays — per-action timing: 1000ms for app launch, 800ms for navigation, 100ms for typing, 300ms default
Verification hints — post-action verification prompts after each Computer Use step
Mouse drag — mouseDrag, mouseDown, mouseUp with smooth interpolation between points
Bulletproof system prompt — planning rules, ctrl+l for URL navigation, recovery strategies for failed actions
Display scaling — automatic resolution scaling to 1280×720 for Computer Use API compatibility
Vision model — claude-sonnet-4-20250514 for Computer Use path

Test Results

Task	Time	API Calls	Result
Google Docs: open Chrome, go to Docs, write a paragraph	187s	14	✅ All succeeded
GitHub: open Chrome, navigate to profile, screenshot	102s	—	✅ All succeeded
Notepad: open, write haiku, save to desktop	~180s	—	✅ File saved correctly
Paint: draw a stick figure	~90s	16	✅ Drawing completed

Breaking Changes

Provider selection now determines execution path. --provider anthropic uses Computer Use API (Path A). --provider openai or no provider uses the original Decompose + Action Router pipeline (Path B). This is a fundamental change in behavior — the same task will execute via completely different code paths depending on the provider.

Performance Characteristics

	Path A (Computer Use)	Path B (Action Router)
Best for	Complex multi-step tasks	Simple single-action tasks
Reliability	Very high	Good for supported patterns
Speed	~90–190s for complex tasks	~2s for simple tasks
Cost	Higher (multiple API calls with screenshots)	Lower (1 text call or zero)
Offline	No	Yes (for common patterns)

[0.1.0] - 2025-01-15

Initial Release

Action Router with Windows UI Automation — 80% of common tasks with zero LLM calls
Vision fallback for complex/unfamiliar UI
Smart task decomposition (single text-only LLM call)
Three-tier safety system (Auto / Preview / Confirm)
REST API and CLI interface
Windows setup script

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

[0.8.3] - 2026-04-19 — Hotfix: "Outlook keeps opening" + runaway guard

Fixed

Notes

[0.8.2] - 2026-04-19 — Session reliability, force-focus, Electron bridge

Fixed

Added

Changed

Tests

Consolidates v0.8.1 (never tagged)

[0.8.0] - 2026-04-16 — V2 Architecture (opt-in)

Added

Fixed

Testing

Platform Safety

[0.7.14] - 2026-04-13 — Full macOS Keyboard Automation + Platform-Aware Pipeline

Fixed

Added

Changed

Platform Safety

[0.7.13] - 2026-04-10 — Unified Permission Checks + Screenshot Helper

Fixed

Added

[0.7.12] - 2026-04-09 — Comprehensive macOS TCC Fix

Fixed

Changed

Technical Details

[0.7.11] - 2026-04-09 — macOS Installer Fix

Fixed

Changed

[0.7.10] - 2026-04-08 — Guided Setup Flow

Changed

[0.7.9] - 2026-04-08 — UX Improvements

Changed

[0.7.8] - 2026-04-08 — Documentation Fix

Fixed

[0.7.7] - 2026-04-08 — Installer Fixes

Fixed

[0.7.6] - 2026-04-08 — macOS Native Host App

Added

Security

Changed

Unchanged

[0.6.3] - 2026-03-01 — Universal Pipeline, Multi-App Workflows, Provider-Agnostic

Added

Fixed

Changed

[0.6.5] - 2026-02-28 — Checkpoint System, Task Completion Detection

Added

Fixed

Changed

[0.6.2] - 2026-02-28 — Universal Provider Support, Auto-Config

Added

Changed

[0.6.1] - 2026-02-28 — Keyboard Shortcuts, Pipeline Fixes

Added

Fixed

Changed

[0.6.0] - 2026-02-28 — Universal Provider Support, OpenClaw Integration

Added

Changed

Security

Fixed

[0.5.6] - 2026-02-27 — Fluid Decomposition, Interactive Doctor, Smart Vision Fallback

Added

Fixed

Changed

Test Results

[0.5.5] - 2026-02-26 — Install/Uninstall, OpenClaw Auto-Registration, Doctor UX

Added

Fixed

Changed

[0.5.4] - 2026-02-25 — SKILL.md Rewrite + Security Hardening

Changed