An MCP server + skill library that lets any MCP-capable LLM client (Claude Code, Claude Desktop, Cursor, Codex CLI, Goose, Continue, Zed, etc.) drive other macOS apps in the background — without activating them, bringing them to the front, or moving your cursor.
Two things ship here:
- MCP server (
server/cua_server.py) — exposes nine tools (list_apps,get_app_state,click,drag,perform_secondary_action,press_key,scroll,set_value,type_text) wrapping the macOS Accessibility, CGEvent, andCGWindowListAPIs. Pure stdio MCP — any compliant client works. - Skill library (
skills/) — markdown files with YAML frontmatter that teach the model the one-snapshot-per-turn loop and app-specific heuristics for Safari, Chrome, Clock, Numbers, Apple Music, Spotify, and Notion. Any skill-aware client auto- loads them; for clients without native skill support, you can read the files directly into a system prompt.
It started as a parity port of the Codex desktop's built-in computer- use tool surface, with a few capabilities the original didn't have: animated scroll, Chrome scrolling via AppleScript JS, Electron accessibility auto-enable, and strict refusal to silently steal focus.
Works on macOS 15, should work on 13+. The tool surface is stable; the
per-app hint files grow as new quirks are found. See
docs/APPS.md for the app-by-app coverage matrix.
- macOS 13 or newer (15+ recommended).
- Python 3.10+ available as
python3. - Accessibility and Screen Recording permissions granted to the process that runs your MCP client (your terminal if you're using a CLI client, the desktop app's bundle if you're using Claude Desktop / Cursor, etc.). System Settings → Privacy & Security.
Clone this repo, then point your client's MCP configuration at
bin/cua-server. The launcher provisions a virtualenv on first run
and execs the stdio server.
Example MCP config snippet (same shape used by Claude Desktop, Cursor, Claude Code, Goose, etc.):
{
"mcpServers": {
"background-computer-use": {
"command": "/absolute/path/to/background-computer-use/bin/cua-server",
"env": {
"CUA_PLUGIN_ROOT": "/absolute/path/to/background-computer-use"
}
}
}
}Your client's specific config file locations:
| Client | Config path |
|---|---|
| Claude Code | project .mcp.json or ~/.claude.json |
| Claude Desktop | ~/Library/Application Support/Claude/claude_desktop_config.json |
| Cursor | .cursor/mcp.json (project) or ~/.cursor/mcp.json (user) |
| Goose | ~/.config/goose/config.yaml (different syntax) |
After restart, the nine tools should appear in your client's tool
list. Loading the skills is client-specific — see
docs/SKILLS.md.
This repo also ships a Claude Code plugin manifest so you can install it via Claude Code's marketplace system:
/plugin marketplace add Panchangam18/background-computer-use
/plugin install background-computer-use@background-computer-use
The .claude-plugin/ directory exists only to enable this path; if
you're using a different MCP client, you can ignore it.
git clone https://github.com/Panchangam18/background-computer-use.git
cd background-computer-use
claude --plugin-dir .The first run takes ~15 seconds while the launcher provisions a
virtualenv under .venv/ with PyObjC, Pillow, and the MCP SDK.
Subsequent runs reuse it.
See docs/DEVELOPING.md for environment
variables, the ghost-cursor playground, and troubleshooting.
With the server loaded in your MCP client:
> What's on my Safari start page?
> Open https://en.wikipedia.org/wiki/Octopus in Safari.
> Scroll down to "Locomotion" in that article.
> Add 2+2 in Calculator and tell me what it says.
> Switch to the Dolphin Wikipedia tab.
All of the above happen without activating the target app. Your cursor stays where it is.
While the agent works, a translucent "ghost" cursor animates to each click target and pulses on landing so you can follow along without losing focus of your real cursor. Two cursor styles ship built in:
default— a bold blue arrow with a white outline (used by default).claude— the Claude logo.
You can also point CUA_CURSOR at any SVG/PNG on disk.
Each MCP client gets its own ghost cursor that persists for the life of its server process. After ~1.5s of inactivity the ghost drifts toward the interior of whichever app window the agent is currently working with, so when multiple agents share this Mac each cursor visibly "belongs to" its own app instead of all piling up on the last click location. The ghost exits automatically when its parent server process dies.
See docs/DEVELOPING.md
for the related env vars (CUA_CURSOR, CUA_CURSOR_HOTSPOT,
CUA_CLICK_PRESS_SCALE, CUA_CLICK_RING, CUA_CLICK_OVERLAY,
CUA_GHOST_PARK_IDLE_S, CUA_GHOST_HARD_IDLE_S).
| Tool | Purpose |
|---|---|
list_apps |
Discover running apps (display name + bundle id). |
get_app_state(app) |
Return the key window's accessibility tree, a window-scoped screenshot, and per-app <app_hints> (if any). Call this once per turn before interacting with an app. |
click(app, element_index=… | x/y=…) |
Click an AX element or raw coordinate. Uses AXPress when advertised; falls back to AppleScript, then CGEventPostToPid. |
drag(app, from_x, from_y, to_x, to_y) |
Smooth ~20-step drag via CGEventPostToPid. |
perform_secondary_action(app, element_index, action) |
Invoke a named AX action (Pick, Increment, ScrollToVisible, etc.). Refuses focus-stealing actions (ShowMenu, ShowAlternateUI, Raise) unless allow_foreground_activation=True. |
press_key(app, key, element_index=…) |
xdotool-style keys (Return, cmd+l, Page_Down, shift+command+t). Can pre-focus a specific AX element. |
scroll(app, element_index, direction, pages=1, smooth=True) |
Smooth, multi-page scroll. Tries AX AXScroll*ByPage, then scroll-bar AXValue, then Chromium AppleScript JS, then pixel wheel events. Refuses to silently steal focus. |
set_value(app, element_index, value, submit=True) |
Set an AX value. Auto-falls-back to a cmd+a → type_text → Return typing sequence for Safari's URL bar and other sticky text fields. |
type_text(app, text, element_index=…) |
Freeform Unicode typing via synthetic keyboard events. |
acquire_desktop(reason, ttl_s=30, wait_s=8) |
Hold the desktop exclusively across multiple tool calls so another agent can't slip in between your turns. See Multiple agents below. |
release_desktop() |
Release a previously-acquired long-lived lease. |
desktop_status() |
Read-only: report which agent currently holds the desktop (if any). |
check_accessibility_permission() |
Diagnostic: confirm whether macOS Accessibility permission is granted, and if not, name the exact app bundle the user needs to toggle on. |
open_accessibility_settings() |
Open System Settings at Privacy & Security → Accessibility so the user can grant permission without hunting for the pane. |
The physical Mac desktop is a singleton peripheral — one keyboard, one
mouse, one focus — so multiple MCP clients (or subagents) sharing this
server can't literally act in parallel, but they can take turns
without stepping on each other. Every mutating tool call
(click, drag, type_text, press_key, scroll, set_value,
perform_secondary_action) automatically acquires a brief
cross-process lease backed by flock on /tmp/cua-desktop.lock.
Uncoordinated calls from different agents serialize cleanly — one
runs, the other waits up to CUA_LEASE_DEFAULT_WAIT_S (default 8s)
and then raises a "desktop busy: held by 'agent-X'" error.
For multi-step sequences that must be atomic (open menu → re-read
state → pick a submenu item), call acquire_desktop first and
release_desktop when done. The underlying lock is released
automatically if your process crashes, so a dead agent never wedges
the desktop permanently. Set CUA_AGENT_LABEL in your MCP client's
env block to give your agent a readable name in contention errors.
Read-only tools (list_apps, get_app_state, desktop_status) do
not take the lease and can be called by any agent at any time.
Four things this library is opinionated about:
- Background-safe by default. No tool silently brings the target app to the foreground. When that isn't possible, the tool errors out and makes the caller opt in explicitly.
- Accessibility first. The tool surface prefers AX primitives (AXPress, AXValue, AXScroll*ByPage) over coordinate-based event synthesis. Coordinate fallbacks exist, but they're the second choice.
- One snapshot per turn. Element indexes are snapshot-local.
get_app_stateis cheap; call it after any UI-changing action. - Per-app knowledge belongs in skills and
<app_hints>, not in the tool implementation. The server is a thin wrapper around the OS; app-specific quirks (Safari's URL bar, Chrome's scroll, Electron's renderer opt-in) are documented in per-app markdown files that ship with the plugin and get injected at tool-call time.
.
├── .claude-plugin/ # Claude Code plugin shim (optional)
│ ├── plugin.json
│ └── marketplace.json
├── .mcp.json # Claude Code per-project MCP config
├── bin/
│ └── cua-server # Bash launcher: provisions .venv/ and execs the server
├── server/
│ ├── cua_server.py # Main FastMCP stdio server (~2,200 LoC)
│ ├── cursor_ghost.py # Ghost-cursor overlay daemon
│ ├── cursor_paths.py # Bezier path generator for the ghost
│ ├── cursor_playground.py # Dev playground for the ghost
│ ├── requirements.txt
│ └── app-hints/ # Per-app <app_hints> payloads (server-private)
├── skills/ # Skill library (any skill-aware MCP client)
│ ├── computer-use/ # Main operating-loop skill
│ ├── safari/
│ ├── chrome/
│ ├── clock/
│ ├── numbers/
│ ├── apple-music/
│ ├── spotify/
│ └── notion/
└── docs/
├── DESIGN.md # Reverse-engineering notes + design rationale
├── APPS.md # App-by-app coverage matrix and limitations
├── SKILLS.md # How to use the skill library with each MCP client
└── DEVELOPING.md # Local-dev loop, env vars, troubleshooting
MIT. See LICENSE.