|
| 1 | +# AGENTS.md — Plasmate Codebase Guide |
| 2 | + |
| 3 | +This file is for AI coding agents (Cursor, Devin, Claude Code, Copilot, etc.). It tells you what the codebase does, how it is structured, and how to make changes safely. |
| 4 | + |
| 5 | +## What Plasmate is |
| 6 | + |
| 7 | +Plasmate is a headless browser engine that compiles web pages into a **Semantic Object Model (SOM)** — structured JSON optimised for LLM consumption — instead of returning raw HTML. 17x average token reduction. No API key, no cloud. |
| 8 | + |
| 9 | +It runs as a CLI, a persistent daemon, an MCP server, and a CDP server. |
| 10 | + |
| 11 | +## Build |
| 12 | + |
| 13 | +```bash |
| 14 | +~/.cargo/bin/cargo build # debug |
| 15 | +~/.cargo/bin/cargo build --release # release |
| 16 | +``` |
| 17 | + |
| 18 | +Requires Rust stable (1.77+). No system dependencies beyond a C linker. |
| 19 | + |
| 20 | +## Test |
| 21 | + |
| 22 | +```bash |
| 23 | +~/.cargo/bin/cargo test # all tests |
| 24 | +~/.cargo/bin/cargo test som:: # SOM tests only |
| 25 | +~/.cargo/bin/cargo test mcp:: # MCP tests only |
| 26 | +RUST_LOG=debug ~/.cargo/bin/cargo test -- --nocapture # with logging |
| 27 | +``` |
| 28 | + |
| 29 | +There are 224+ tests. All must pass before a PR. |
| 30 | + |
| 31 | +## Key directories |
| 32 | + |
| 33 | +``` |
| 34 | +src/ |
| 35 | + main.rs CLI entry point (fetch, compile, diff, mcp, serve, daemon, screenshot) |
| 36 | + mcp/ |
| 37 | + mod.rs MCP server, JSON-RPC router, session manager |
| 38 | + tools.rs ALL MCP tool definitions + handlers (add new tools here) |
| 39 | + sessions.rs Persistent browser session state |
| 40 | + som/ |
| 41 | + mod.rs SOM data structures and serialisation |
| 42 | + filter.rs apply_selector() — shared between CLI and MCP |
| 43 | + compiler.rs HTML → SOM compiler (the core algorithm) |
| 44 | + js/ |
| 45 | + runtime.rs V8-backed JS execution |
| 46 | + pipeline.rs Full fetch+JS+compile pipeline |
| 47 | + network/ |
| 48 | + fetch.rs HTTP client (reqwest) |
| 49 | +sdk/python/ Python SDK (MCP client) |
| 50 | +sdk/node/ Node.js SDK (MCP client) |
| 51 | +integrations/ LangChain, LlamaIndex, Browser Use, etc. |
| 52 | +packages/ som-parser-python, som-parser-node |
| 53 | +``` |
| 54 | + |
| 55 | +## How to add an MCP tool |
| 56 | + |
| 57 | +1. Add a `struct YourToolParams` with `#[derive(Deserialize)]` in `src/mcp/tools.rs` |
| 58 | +2. Write `pub fn your_tool_definition() -> ToolDefinition` with name, description, and input_schema |
| 59 | +3. Write `pub async fn handle_your_tool(arguments: &Value, ...) -> Value` handler |
| 60 | +4. Register both in `src/mcp/mod.rs` — add to `list_tools()` and to the match in `call_tool()` |
| 61 | +5. Add tests in `src/mcp/tools.rs` under `#[cfg(test)]` |
| 62 | + |
| 63 | +Look at `extract_links_definition()` and `handle_extract_links()` for a clean minimal example. |
| 64 | + |
| 65 | +## MCP tool description guidelines |
| 66 | + |
| 67 | +Tool descriptions are read by LLMs (Claude, GPT-4, etc.) to decide which tool to call. Write them as action-oriented instructions, not feature lists: |
| 68 | + |
| 69 | +- State WHAT it returns concretely |
| 70 | +- State WHEN to use it vs alternatives |
| 71 | +- Include any token-saving tips (`selector='main'`) |
| 72 | +- Avoid vague phrases like "token-efficient" without numbers |
| 73 | + |
| 74 | +## SOM selector syntax |
| 75 | + |
| 76 | +`apply_selector(som, sel)` in `src/som/filter.rs` — supported values: |
| 77 | + |
| 78 | +| Selector | Matches | |
| 79 | +|----------|---------| |
| 80 | +| `main` | `<main>` and `role=main` regions | |
| 81 | +| `nav` | Navigation regions | |
| 82 | +| `header` / `footer` | Header / footer regions | |
| 83 | +| `aside` | Sidebar regions | |
| 84 | +| `content` | Article / content regions | |
| 85 | +| `form` | Form regions | |
| 86 | +| `dialog` | Dialog/modal regions | |
| 87 | +| `#foo` | Region with id `foo` | |
| 88 | + |
| 89 | +Returns full SOM if selector matches nothing (graceful fallback). |
| 90 | + |
| 91 | +## Python SDK |
| 92 | + |
| 93 | +Located in `sdk/python/`. Run tests with: |
| 94 | + |
| 95 | +```bash |
| 96 | +cd sdk/python && PYTHONPATH=src python3 -m pytest tests/ -v |
| 97 | +``` |
| 98 | + |
| 99 | +The key helper to know: `_extract_last_json(text)` in `client.py` — hardened JSON parser used by both sync and async `_call_tool`. It handles mixed output (progress lines before JSON, embedded JSON in log messages). |
| 100 | + |
| 101 | +## Common patterns |
| 102 | + |
| 103 | +**Error responses (Rust MCP handlers):** |
| 104 | +```rust |
| 105 | +return error_response("descriptive message here"); |
| 106 | +``` |
| 107 | + |
| 108 | +**Returning SOM as MCP content:** |
| 109 | +```rust |
| 110 | +return tool_response(serde_json::to_string(&result).unwrap_or_default()); |
| 111 | +``` |
| 112 | + |
| 113 | +**Applying selector before responding:** |
| 114 | +```rust |
| 115 | +let effective_som = if let Some(ref sel) = params.selector { |
| 116 | + crate::som::filter::apply_selector(&page_result.som, sel) |
| 117 | +} else { |
| 118 | + page_result.som.clone() |
| 119 | +}; |
| 120 | +``` |
| 121 | + |
| 122 | +## What NOT to do |
| 123 | + |
| 124 | +- Do not call `reqwest::blocking` from inside a V8 callback or Tokio async context — use `std::thread::spawn` + `mpsc::channel` to escape (see PR #27 for the pattern) |
| 125 | +- Do not add `unwrap()` on network operations — always handle errors and return `error_response()` |
| 126 | +- Do not break the `apply_selector()` contract — it must return full SOM on no-match, never panic |
| 127 | +- Do not change the `--format` or `--selector` CLI flags without updating both `main.rs` and `src/mcp/tools.rs` |
| 128 | + |
| 129 | +## CI |
| 130 | + |
| 131 | +GitHub Actions runs `cargo test` and `cargo clippy` on every PR. Both must pass. |
0 commit comments