browser-auto

LLM-driven browser automation harness. POST a natural-language task and a bag of variables; the service spins up a cloud browser, runs an autonomous agent against it, and exposes a live preview URL plus a polling endpoint for status. Built on Kernel (cloud Chrome via CDP), Playwright, and a provider-neutral agent loop that drives Claude (Anthropic), GPT (OpenAI), or Gemini (Google) — pick the model per request.

The first run on a merchant uses the LLM end-to-end and records the action trajectory to disk. Subsequent runs on the same merchant replay the recorded steps deterministically — no per-step LLM calls — and fall back to single-step healing if the page has drifted. This drops typical replay times from ~30–60s to ~10–15s and replay cost to ~$0.

1. Setup

For AI agents installing this: follow the steps in order. Do not skip verification. Every step shows the exact command and the output you should expect.

1.1 Prerequisites

Tool	Version	How to check	Install if missing
Node.js	≥ 20.17 (LTS)	`node --version`	https://nodejs.org/en/download
npm	≥ 10	`npm --version`	bundled with Node
git	any recent	`git --version`	https://git-scm.com/downloads

No local Chrome / Playwright browsers are needed. The harness connects to a remote Kernel browser over CDP; the Playwright npm package ships the JS client we use.

1.2 Get the required API keys

You always need one Kernel key (cloud browser). For the agent's LLM you need at least one of: Anthropic, OpenAI, or Google — whichever provider matches the model you'll request. You only need a key for providers you actually plan to use.

Kernel — provides the cloud Chrome the agent drives.

Sign up at https://www.kernel.sh.
After login, go to Dashboard → API Keys (or visit https://app.kernel.sh/keys directly).
Click Create new key. Copy the value — it looks like kn_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxx.
Kernel gives new accounts a small credit balance; that's enough for ~30 minutes of headful sessions. Top up under Billing if needed.

Anthropic — for Claude models (claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5, etc.).

Sign up at https://console.anthropic.com.
Go to Settings → API Keys (https://console.anthropic.com/settings/keys).
Click Create Key, name it (e.g. browser-auto-local), copy the value — it looks like sk-ant-api03-xxxxxxxx.... You will not be able to view it again.
Make sure your workspace has access to the model you intend to use under Settings → Models. If only older models show, add a payment method under Billing.

OpenAI — for GPT / o* models (gpt-5, gpt-4.1, o3, o4-mini, etc.).

Sign up at https://platform.openai.com.
Go to https://platform.openai.com/api-keys.
Click Create new secret key, name it, copy the value — it looks like sk-proj-xxxxxxxx....
Add a billing method at https://platform.openai.com/settings/organization/billing.

Google — for Gemini models (gemini-2.5-pro, gemini-2.5-flash, etc.).

Go to https://aistudio.google.com.
Click Get API key → Create API key.
Copy the value — it looks like AIzaSyXXXXXXXX....
Free tier covers light usage; for production, link a billing project in Google Cloud and enable the Generative Language API.

1.3 Clone and install

git clone <this-repo-url> browser-auto
cd browser-auto
npm install

Expected output: ends with something like added N packages, found 0 vulnerabilities. Warnings about npm version mismatches are safe to ignore.

1.4 Create the `.env` file

The file must live at the repo root (./env, NOT src/.env). Create it by copying the example:

cp .env.example .env

Then edit .env. Set KERNEL_API_KEY and at least one provider key (only the ones you'll use):

KERNEL_API_KEY=kn_live_paste-your-kernel-key-here

# Set the keys for providers you intend to use. Unused lines are fine left blank.
ANTHROPIC_API_KEY=sk-ant-api03-paste-your-anthropic-key-here
OPENAI_API_KEY=sk-proj-paste-your-openai-key-here
GOOGLE_API_KEY=AIzaSy-paste-your-google-key-here

# Default model used when a request doesn't specify one.
DEFAULT_MODEL=claude-opus-4-7

PORT=3000
HOST=0.0.0.0
LOG_LEVEL=info
RECIPE_DIR=./recipes
SCREENSHOT_DIR=./screenshots

DEFAULT_MODEL, PORT, HOST, LOG_LEVEL, RECIPE_DIR, SCREENSHOT_DIR are optional — the defaults shown above are fine for local development.

Security: .env is in .gitignore. Do not commit it. Do not paste the keys into a chat or any tracked file. If you need to share them between developers, use a secrets manager (1Password, Doppler, Vault) — not git.

1.5 Verify your setup

Run the smoke test. It spins a Kernel browser, attaches Playwright, asks the agent to read example.com, and tears down — typically 8–25 seconds and a couple of US cents.

npm run smoke                                                  # default model (claude-opus-4-7)
MODEL=gpt-4.1 npm run smoke                                    # OpenAI
MODEL=gemini-3.1-pro-preview-customtools npm run smoke         # Google

Expected output (truncated):

  ┃ LIVE VIEW → https://proxy.<region>.onkernel.com:8443/browser/live/...
  ┃ session   → <session-id>

[..] INFO: kernel session ready
[..] INFO:   ➜ navigate({"url":"https://example.com"})
[..] INFO:   ➜ done({"success":true,"summary":"Example Domain: This domain is for use ..."})
[..] INFO: smoke ok
[..] INFO: kernel session deleted

Exit code 0 means setup is good. If you see:

KERNEL_API_KEY not set → .env is missing or the key line is malformed. Confirm cat .env | grep KERNEL_API_KEY shows your key.
ANTHROPIC_API_KEY not set / OPENAI_API_KEY not set / GOOGLE_API_KEY not set → the LLM provider for the model you're using needs its key set. The smoke defaults to Claude; set MODEL=gpt-4.1 or MODEL=gemini-2.5-pro to switch.
401 or 403 from Kernel → key is wrong or expired; regenerate at https://app.kernel.sh/keys.
not_found_error: model: <id> → your workspace doesn't have access to that model yet; add a payment method or pick another model. Switch with MODEL=<id> for the smoke.
Cannot infer provider from model id "<id>" → the model id doesn't match a known prefix (claude-*, gpt-*, o[134]*, gemini-*). Use a recognised prefix or the explicit provider/model form (e.g. anthropic/my-internal-alias).
ECONNREFUSED / hang → outbound network blocked. The harness needs to reach *.onkernel.com, api.anthropic.com, api.openai.com, and/or generativelanguage.googleapis.com depending on which provider you use.

1.6 Start the API server

npm run start

Expected output:

[..] INFO: browser-auto API listening
    port: 3000
    host: "0.0.0.0"

Leave this running. From another terminal, sanity-check:

curl -s http://localhost:3000/health
# → {"ok":true}

For watch-mode development (auto-restart on file changes):

npm run dev

1.7 Submit your first task

curl -s -X POST http://localhost:3000/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Buy one pair of shoelaces from allbirds.com as a guest. Stop on the order confirmation page.",
    "merchant_url": "https://www.allbirds.com/",
    "recipes": { "host": "allbirds.com" },
    "variables": {
      "buyerEmail":     { "value": "test@example.com",   "description": "guest checkout email" },
      "firstName":      { "value": "Test",               "description": "first name" },
      "lastName":       { "value": "User",               "description": "last name" },
      "addressLine1":   { "value": "555 4th St.",        "description": "street address" },
      "city":           { "value": "San Francisco",      "description": "city" },
      "stateCode":      { "value": "CA",                 "description": "state 2-letter code" },
      "postalCode":     { "value": "94107",              "description": "ZIP" },
      "phone":          { "value": "4155550123",         "description": "phone" },
      "cardNumber":     { "value": "4242424242424242",   "description": "Visa test card" },
      "cardExpiry":     { "value": "12/27",              "description": "card MM/YY" },
      "cardCvv":        { "value": "123",                "description": "CVV" },
      "cardholderName": { "value": "Test User",          "description": "name on card" }
    }
  }'

Response (202 Accepted):

{
  "task_id": "task_a1b2c3d4e5f6",
  "status": "running",
  "preview_url": "https://proxy.iad-...onkernel.com:8443/browser/live/abc",
  "kernel_session_id": "z9...",
  "poll_url": "/tasks/task_a1b2c3d4e5f6"
}

Open preview_url in a browser to watch the agent work in real time. Poll the task to check status (recommended every 2–5s):

curl -s http://localhost:3000/tasks/task_a1b2c3d4e5f6 | jq

When status becomes succeeded, failed, or needs_human, the run is terminal — stop polling.

2. API reference

`POST /tasks`

Start a new agent run. The server returns 202 Accepted once the Kernel browser session is created (typically <2s); the agent then runs in the background.

Field	Type	Required	Notes
`task`	string	yes	Natural-language goal for the agent.
`variables`	object	yes	Map of `name` → `{ value, description }`. See §3.
`merchant_url`	string	no	If set, the page is navigated here before the agent starts. Saves the agent one navigation step.
`recipes`	object	no	`{ host: string, flow_key?: string }`. Opts into record/replay (§4).
`max_steps`	number	no	Agent step budget. Default 60.
`model`	string	no	Model id. Provider is inferred from the prefix (see Supported models below). Default = server's `DEFAULT_MODEL`.
`proxy`	object	no	`{ country?: string (ISO-2), state?: string, type?: "isp" \| "residential" \| "datacenter" }`. Default: `{ country: "US", type: "isp" }`.
`headless`	boolean	no	Default `false` so Kernel returns a `preview_url`. Set `true` for cheaper headless runs without preview.

Response 202:

{
  "task_id": "task_...",
  "status": "running",
  "preview_url": "https://...",
  "kernel_session_id": "...",
  "poll_url": "/tasks/task_..."
}

Response 400 on validation errors ({ error: "invalid_input", message: "..." }).

`GET /tasks/:id`

Returns the full task record. Poll this for live progress — you don't need to open the Kernel preview URL to see what the agent is doing. The response includes a latest_step field that shows exactly what just happened, in human-readable text. This is the same per-step commentary that the live preview would show.

{
  "task_id": "task_a1b2c3d4e5f6",
  "status": "running",                    // queued | running | succeeded | failed | needs_human
  "preview_url": "https://...",
  "kernel_session_id": "...",
  "step_count": 14,
  "latest_step": {                        // freshest action — useful for live status
    "step": 14,
    "tool": "fill_card",
    "args": { "field": "cvc", "value": "%cardCvv%" },
    "result": "filled cvc via visible input[autocomplete=\"cc-csc\"]#0"
  },
  "steps": [
    { "step": 1, "tool": "navigate", "args": {...}, "result": "navigated to ..." },
    { "step": 2, "tool": "click",    "args": {...}, "result": "clicked button \"Add to Cart\"" },
    ...
  ],
  "result": {                             // populated when status is terminal
    "success": true,
    "summary": "Order placed",
    "order_id": "AB12345",
    "total": "$22.38",
    "final_url": "https://www.allbirds.com/checkouts/.../thank-you",
    "final_title": "Order confirmation — Allbirds"
  },
  "error": null,
  "created_at": "2026-05-17T13:00:00.000Z",
  "updated_at": "2026-05-17T13:02:14.391Z"
}

Recommended polling: every 2–5 seconds. Display latest_step.result to your user as the live status line. result is null until the run terminates.

`GET /tasks/:id/events` (Server-Sent Events stream)

If you'd rather not poll, subscribe to a live event stream. One persistent HTTP connection delivers each step the moment it happens — same data as the polling response's latest_step, but pushed.

curl -N http://localhost:3000/tasks/task_a1b2c3d4e5f6/events

Stream output (one event per \n\n block):

event: status
data: {"type":"status","status":"running","at":"2026-05-17T13:00:01.123Z"}

event: preview_ready
data: {"type":"preview_ready","preview_url":"https://proxy.iad-...","at":"2026-05-17T13:00:01.234Z"}

event: step
data: {"type":"step","step":{"step":1,"tool":"navigate","args":{"url":"..."},"result":"navigated to ..."},"at":"2026-05-17T13:00:02.000Z"}

event: step
data: {"type":"step","step":{"step":2,"tool":"click","args":{"index":29,...},"result":"clicked button \"Add to Cart\""},"at":"2026-05-17T13:00:04.500Z"}

...

event: end
data: {"type":"end","status":"succeeded","at":"2026-05-17T13:02:14.391Z"}

The server replays the full event history when you connect, then streams new events live, then closes the connection on a terminal status. Heartbeats (: heartbeat\n\n comment lines) are sent every 15s to keep proxies happy.

Event types: status (status_change), preview_ready (when the Kernel session is up and preview_url is available), step (each agent action), end (terminal — connection will close).

Client patterns:

Node: EventSource from the eventsource package, or fetch(...) with a streaming response body.
Browser: new EventSource("/tasks/<id>/events").
Shell: curl -N (the -N flag disables buffering).

`GET /health`

Returns { "ok": true }. No auth.

Supported models

The model field on POST /tasks accepts any id that maps to one of the three providers via prefix inference, or an explicit provider/model form.

Provider	Model id pattern	Examples
Anthropic	`claude-*` or `anthropic/<id>`	`claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5`
OpenAI	`gpt-`, `o\d`, `o\d-`, or `openai/<id>`	`gpt-5`, `gpt-4.1`, `o3`, `o4-mini`
Google	`gemini-*` or `google/<id>`	`gemini-3.1-pro-preview-customtools`, `gemini-2.5-pro`

The provider's API key must be set in .env (see §1.2). If you submit a task with a model whose provider key is missing, the task fails fast with a runner_crash error and a clear message.

Recommended model per provider for this agent

This agent does DOM-based browsing (snapshot interactive elements + function-calling tools). Pick the model variant tuned for that pattern:

Provider	Recommended model id	Why
Anthropic	`claude-opus-4-7`	Smartest currently available; works well with our tool surface. ✓ verified end-to-end.
OpenAI	`gpt-4.1` or `o4-mini`	`gpt-4.1` for accuracy, `o4-mini` for cost. Either uses standard function calling.
Google	`gemini-3.1-pro-preview-customtools`	Gemini 3.1 has several preview variants; the `-customtools` one is tuned for custom function-calling workflows (which is what our agent uses). ✓ verified end-to-end.

Two Gemini variants worth knowing about:

gemini-3.1-pro-preview — general-purpose Gemini 3.1 Pro. Works but the customtools variant is better when (as here) the workflow is custom function calls rather than free-form text.
gemini-2.5-computer-use-preview-10-2025 — not for this harness. That's Google's Computer-Use Agent (CUA) model which takes screenshots and emits coordinates. Our agent is DOM-based, not coordinate-based; using a CUA model here will give odd results.

Switching providers per request

# Anthropic (default)
curl -X POST http://localhost:3000/tasks -d '{
  "task": "...",
  "variables": {},
  "model": "claude-opus-4-7"
}'

# OpenAI
curl -X POST http://localhost:3000/tasks -d '{
  "task": "...",
  "variables": {},
  "model": "gpt-4.1"
}'

# Google
curl -X POST http://localhost:3000/tasks -d '{
  "task": "...",
  "variables": {},
  "model": "gemini-3.1-pro-preview-customtools"
}'

Behaviour is identical across providers: each adapter converts our internal tool/message format to its own native shape and forces a single function call per turn. Recipes recorded by one provider replay on another — the recipe stores tool calls in a provider-neutral format, so a trajectory recorded with Claude will replay correctly when the next run uses Gemini or GPT.

3. Variables and the security model

Personal and payment data is passed as variables. In your task prompt and in tool arguments the agent references variables by name, never by literal value. The harness substitutes the real value at execution time inside the Playwright call — the substituted string never enters Claude's context window or transcript.

In the task prompt:

"Fill the card details using the variables provided."

In tool calls Claude emits:

{ "tool": "fill_card", "args": { "field": "number", "value": "%cardNumber%" } }

The runtime expands %cardNumber% to the actual digits just before calling page.fill(). Logs and the steps array preserve the %name% placeholder.

Variable name rules: must match ^[a-zA-Z0-9_]+$. Each variable must include a description — that's what the agent reads to decide which variable goes where.

This is not a PCI substitute. The values still pass over HTTPS to the server and through Playwright into the browser. The substitution only keeps them out of the LLM transcript. If the API key for this service leaks, an attacker can submit tasks that exfiltrate the variables they POST. Treat the service the same as you would any payment-handling endpoint.

4. Recipes and playbooks

The agent records a successful trajectory once and replays it from then on. This is opt-in per request via the recipes field:

"recipes": { "host": "allbirds.com" }

When set, the harness:

Loads ./recipes/<host>/<flow_key>.json if present.
Loads ./recipes/<host>/playbook.md if present (hints prepended to the agent's system prompt — speeds up the first run too).
Replay mode (recipe trust == "validated"): walks the recorded steps; re-snapshots the page before each step and resolves the target element by tag + name. On a miss, issues a single-step heal LLM call.
Record mode (no recipe, or trust == "draft", or trust == "healing"): runs the full LLM loop and records every successful tool call.

Trust transitions:

State	After successful run	After failed run
`draft` (success_count < 3)	success_count++; if ≥3 → `validated`	stays `draft`
`validated`	success_count++; stays `validated`	→ `healing`
`healing`	→ `validated`	stays `healing`

If too many steps (default >3) drift in one replay, the recipe is auto-dropped and the next run re-records.

File layout:

recipes/<host>/default.json   ← runtime state, gitignored
recipes/<host>/playbook.md    ← hand-tuned hints, committed

Example playbook lives at recipes/allbirds.com/playbook.md.

5. Live preview

There are two ways to watch a running task. Pick whichever fits your UX:

Method	Use when	Visual?
Polling `GET /tasks/:id`	You want to render the agent's status in your own UI as plain text. The `latest_step.result` field is a human-readable description of every action.	No — text only
SSE `GET /tasks/:id/events`	Same content as polling but pushed instead of pulled — lower latency, lower request cost. Best for dashboards.	No — text only
Kernel `preview_url`	You want a visual stream of the actual browser. Useful for demos, first runs on a new merchant, or debugging tricky UI state.	Yes — a live video of Chrome

The preview_url returned by POST /tasks is a Kernel-hosted live view of the cloud Chrome. Open it in any browser to watch the agent visually. It's only present when the session is headful (headless: false, the default). It's valid for the lifetime of the Kernel session and stops working when the task terminates.

You do not need to open the preview URL to know what's happening. The polling and SSE channels carry the full per-step commentary. Use the preview when you want pixels; use polling/SSE for everything else.

6. Local development tools

npm run dev              # watch-mode server (tsx watch)
npm run smoke            # tiny end-to-end smoke against example.com
npm run allbirds         # standalone allbirds checkout script
npm run peek             # screenshot whatever the KEEP_ALIVE session shows
npm run kernel:close     # tear down the KEEP_ALIVE session
npm run diagnose:card    # dump every iframe's inputs (debug payment forms)
npm run typecheck        # tsc --noEmit

Picking the model for CLI scripts: smoke and allbirds honour the MODEL env var. The server, by contrast, uses DEFAULT_MODEL from .env for any request that doesn't pass model. They're separate so you can keep the server on Claude as the default while spot-testing other providers from the CLI:

MODEL=gpt-4.1 npm run smoke
MODEL=gemini-3.1-pro-preview-customtools npm run smoke
KEEP_ALIVE=1 MODEL=gpt-4.1 npm run allbirds

For interactive iteration on a single merchant: set KEEP_ALIVE=1 and run scripts/allbirds.ts. The Kernel session is persisted to .kernel-session.json and reattached on the next run — no fresh browser spin-up, no lost cart/cookies state. Use npm run peek to inspect, and npm run kernel:close when done.

7. Environment variables

Var	Required	Default	Notes
`KERNEL_API_KEY`	yes	—	Kernel cloud Chrome account.
`ANTHROPIC_API_KEY`	conditional	—	Required when using `claude-*` models.
`OPENAI_API_KEY`	conditional	—	Required when using `gpt-` / `o-*` models.
`GOOGLE_API_KEY`	conditional	—	Required when using `gemini-*` models. `GEMINI_API_KEY` is accepted as a synonym.
`DEFAULT_MODEL`	no	`claude-opus-4-7`	Used when a request doesn't pass `model`.
`MODEL`	no	`claude-opus-4-7`	Used by `npm run smoke` / `npm run allbirds` standalone scripts.
`PORT`	no	3000	HTTP port.
`HOST`	no	0.0.0.0	Bind interface.
`LOG_LEVEL`	no	info	Pino level.
`RECIPE_DIR`	no	./recipes	Where recipes + playbooks live.
`SCREENSHOT_DIR`	no	./screenshots	Where per-run screenshots go.
`KERNEL_PROXY_TYPE`	no	residential	Default proxy type for standalone scripts.
`KERNEL_HEADLESS`	no	unset	Set to `1` to default to headless (no preview URL).
`KEEP_ALIVE`	no	unset	Set to `1` in local scripts to persist the Kernel session between runs.

8. File layout

src/
├── agent/
│   ├── loop.ts        ← runAgent: dispatches replay vs record, persists recipe
│   ├── replay.ts      ← deterministic step execution + single-step heal
│   ├── playbook.ts    ← loads recipes/<host>/playbook.md
│   ├── elements.ts    ← cross-frame DOM snapshot with stable resolve hints
│   ├── tools.ts       ← click / fill / fill_card / navigate / scroll / etc.
│   └── llm/
│       ├── types.ts        ← provider-neutral types (LLMClient, ChatMessage, ToolDef)
│       ├── anthropic.ts    ← Claude adapter
│       ├── openai.ts       ← GPT / o-series adapter
│       ├── google.ts       ← Gemini adapter
│       └── index.ts        ← createLLMClient(modelId) factory
├── recipes/
│   ├── types.ts       ← Recipe, RecipeStep, Trust transitions
│   └── store.ts       ← DiskRecipeStore (JSON files)
├── kernel/session.ts  ← Kernel lifecycle + KEEP_ALIVE
├── obs/logger.ts      ← pino structured logging
└── server/
    ├── index.ts       ← entrypoint
    ├── api.ts         ← Fastify routes
    ├── store.ts       ← in-memory TaskStore
    └── runner.ts      ← drives one task end-to-end

scripts/
├── allbirds.ts        ← standalone demo (KEEP_ALIVE-friendly)
├── smoke.ts           ← ~30-line agent smoke
├── peek.ts            ← screenshot the live KEEP_ALIVE session
├── close.ts           ← tear down KEEP_ALIVE session
└── diagnose-card.ts   ← dump iframe inputs (payment-form debugging)

recipes/<host>/
├── default.json       ← recorded action trajectory (runtime state, gitignored)
└── playbook.md        ← hand-tuned merchant hints (tracked in git)

9. Production notes (not yet shipped)

This is a v1. Before deploying:

No authentication on the API. Put it behind a reverse proxy / auth layer of your choice.
In-memory task store. Tasks are lost on restart. Swap for Redis or Postgres if you need persistence or multi-instance scale.
No queue, no concurrency limit. Each POST /tasks immediately spins a Kernel session. Add a queue + worker pool for fairness and to bound Kernel cost.
Screenshots stay on the server's disk under ./screenshots/. If you need them remotely, layer an S3 uploader on top of the runner.
3DS / CAPTCHA: the agent is instructed to call done(success=false, failure_reason="three_ds_required" | "captcha_required") rather than attempt to solve them. Handle these statuses on your end.
Geo / proxy: default is US-ISP. Set proxy.country / proxy.state per request for geo-specific runs. Residential rotates per TCP connection; prefer ISP for checkout-style flows where stickiness matters.

10. Anatomy of an agent step

For reference, here's what the agent loop does on every iteration in record mode:

Snapshot: enumerate visible interactive elements across the main page and all iframes; tag each with a unique data-agent-id.
Call the LLM: send the page state + history + variable legend + tool definitions through the active LLMClient adapter; force exactly one tool call per response. The same internal call shape works against any of the three providers; the adapter translates to provider-native format.
Substitute variables: replace %name% tokens with real values just before the Playwright call.
Execute: click / fill / navigate / scroll / etc. via Playwright.
Record (if recipes opt-in): append the tool call to the recipe with element resolve hints (tag + name + frame).
Loop until done(...) is called or max_steps is hit.

In replay mode, steps 1, 3, 4 are deterministic; the LLM call (step 2) is only made for steps that fail to resolve or execute — at most 3 per run.

Provider adapters

Each adapter lives in src/agent/llm/:

anthropic.ts — @anthropic-ai/sdk. Tools forwarded as Anthropic's native shape; forced single-call via tool_choice: { type: "any", disable_parallel_tool_use: true }.
openai.ts — openai SDK. Tools wrapped as function; forced single-call via tool_choice: "required" + parallel_tool_calls: false. Tool results split into role:"tool" messages per OpenAI's spec.
google.ts — @google/genai SDK. Tools wrapped as functionDeclarations; forced single-call via toolConfig: { functionCallingConfig: { mode: "ANY" } }. Fabricates tool-use ids (Gemini doesn't emit them) and maps id→name so tool results can be matched by function name on the way back.

Adding a fourth provider is small: implement the LLMClient interface from src/agent/llm/types.ts and add a prefix branch in src/agent/llm/index.ts:parseModelId.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

browser-auto

1. Setup

1.1 Prerequisites

1.2 Get the required API keys

1.3 Clone and install

1.4 Create the `.env` file

1.5 Verify your setup

1.6 Start the API server

1.7 Submit your first task

2. API reference

`POST /tasks`

`GET /tasks/:id`

`GET /tasks/:id/events` (Server-Sent Events stream)

`GET /health`

Supported models

Recommended model per provider for this agent

Switching providers per request

3. Variables and the security model

4. Recipes and playbooks

5. Live preview

6. Local development tools

7. Environment variables

8. File layout

9. Production notes (not yet shipped)

10. Anatomy of an agent step

Provider adapters

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
recipes		recipes
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

browser-auto

1. Setup

1.1 Prerequisites

1.2 Get the required API keys

1.3 Clone and install

1.4 Create the .env file

1.5 Verify your setup

1.6 Start the API server

1.7 Submit your first task

2. API reference

POST /tasks

GET /tasks/:id

GET /tasks/:id/events (Server-Sent Events stream)

GET /health

Supported models

Recommended model per provider for this agent

Switching providers per request

3. Variables and the security model

4. Recipes and playbooks

5. Live preview

6. Local development tools

7. Environment variables

8. File layout

9. Production notes (not yet shipped)

10. Anatomy of an agent step

Provider adapters

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1.4 Create the `.env` file

`POST /tasks`

`GET /tasks/:id`

`GET /tasks/:id/events` (Server-Sent Events stream)

`GET /health`

Packages