feat: add open-operator — AI agent browser automation by jackwener · Pull Request #614 · jackwener/opencli

jackwener · 2026-03-30T15:13:14Z

Summary

Add opencli operate command — an AI agent that autonomously controls the browser to complete tasks described in natural language, then saves successful operations as reusable TypeScript CLI skills.

End-to-end verified: Successfully tested on example.com (data extraction) and Twitter/X (DM listing with login state reuse).

Key capabilities

opencli operate "task" — LLM-driven browser control loop (observe → reason → act → repeat)
opencli operate --save-as site/name — Saves successful operations as .ts adapters via LLM code generation
Native CDP input — Input.dispatchMouseEvent/KeyEvent for isTrusted:true events, with automatic JS injection fallback
Rich trace capture — Network interceptor captures API responses for intelligent strategy selection
API discovery — Analyzes captured requests to find the "golden API", recommends optimal strategy (PUBLIC/COOKIE/UI)
Self-repair — Generated adapters are syntax-validated; errors are fed back to LLM for auto-fix

Architecture

CLI → Agent Loop → DOM Snapshot + LLM → Execute Actions → Observe → Repeat
                                                              ↓
                                              Rich Trace (actions + network + auth)
                                                              ↓
                                              API Discovery → LLM Code Gen → .ts Adapter

New files

File	Purpose
`src/agent/agent-loop.ts`	Core LLM-driven control loop
`src/agent/action-executor.ts`	Action dispatch with CDP→JS fallback
`src/agent/dom-context.ts`	DOM snapshot + element coordinate map
`src/agent/prompts.ts`	System prompt & step message builder
`src/agent/llm-client.ts`	Anthropic SDK wrapper (supports `ANTHROPIC_BASE_URL`)
`src/agent/types.ts`	Zod schemas for actions & responses
`src/agent/trace-recorder.ts`	Rich context capture (network + auth + thinking)
`src/agent/api-discovery.ts`	API scoring & strategy recommendation
`src/agent/skill-saver.ts`	LLM-powered TS adapter generation + validation

Security

CDP passthrough uses a method allowlist (22 permitted methods)
Network response bodies are sanitized before writing to disk (tokens/passwords redacted)
Auth tokens stored as boolean flags only, never actual values
--save-as names validated against path traversal ([a-zA-Z0-9_-] only)

Dependencies

zod — Runtime validation of LLM structured output
@anthropic-ai/sdk — Anthropic API client

Test plan

opencli operate --help shows correct usage
opencli operate --url https://example.com "extract heading" — completes in 1 step
opencli operate --url https://x.com "get my DMs" — navigates, extracts 10 DMs with login state
opencli operate --save-as test/heading "extract heading" — generates .ts adapter
TypeScript compilation passes (main project + extension)
npm run build succeeds (392 manifest entries)
Replay generated adapter: opencli test heading

Complete AI agent browser automation system for OpenCLI: Core Agent (src/agent/): - LLM-driven browser control loop with 14 action types - Planning system with plan CRUD and replan nudges - Sliding-window loop detection with page fingerprinting - Autocomplete field detection with 3-layer protocol - Value mismatch detection (post-type readback) - Final response after failure (captures partial results) - Message compaction with epistemic guard (unverified labels) - Sensitive data masking before LLM Multi-Provider LLM: - OPENCLI_PROVIDER (anthropic/openai) + OPENCLI_MODEL + OPENCLI_API_KEY - Model aliases: sonnet, opus, haiku, gpt-5.4, gpt-4.1, o3 - HTML proxy detection with actionable error messages - Prompt caching for Anthropic, token tracking for both Browser Control: - CDP passthrough with method allowlist (22 permitted methods) - scrollIntoView before click/type (fixes off-viewport elements) - Three-strategy click fallback: CDP → page.click → evaluate JS - Two-layer retry for extension interference (operate: 5x/1.5s, normal: 2x/500ms) - about:blank for blank tabs (prevents extension hijacking) - opencli browse commands for Claude Code skill integration Skill Generation: - Rich trace capture (network interceptor + auth context + thinking log) - API discovery: score requests to find golden API, recommend strategy - LLM-powered TS adapter generation with validation + self-repair - node_modules symlink for user TS adapter package resolution AutoResearch: - 59-task eval framework (49 train + 10 test, 8 categories) - Claude Code research instructions (program.md) - Baseline: 56/59 (95%), train 48/49 (98%) Documentation: - OPERATE.md: full user guide with configuration, costs, troubleshooting - SKILL.md split into 3 specialized skills (cli, operate, adapter-dev) - README updated with AI Agent section New CLI commands: - opencli operate|op <task> — AI agent browser automation - opencli browse open/state/click/type/eval/screenshot/scroll/back/close - opencli doctor — now shows LLM provider status Dependencies: zod, @anthropic-ai/sdk, openai

jackwener force-pushed the open-operator branch 4 times, most recently from 68e5e94 to 7fde454 Compare April 1, 2026 17:15

jackwener force-pushed the open-operator branch from 7fde454 to dc99964 Compare April 1, 2026 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add open-operator — AI agent browser automation#614

feat: add open-operator — AI agent browser automation#614
jackwener wants to merge 1 commit intomainfrom
open-operator

jackwener commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jackwener commented Mar 30, 2026

Summary

Key capabilities

Architecture

New files

Security

Dependencies

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant