Merge pull request #33 from dbhurley/feat/ai-discoverability

dbhurley · web-flow · commit ea5a7516fcda · 2026-04-02T07:12:20.000-04:00
LGTM. cargo check clean. Tool descriptions updated with accurate 17x stat and action-oriented guidance. AGENTS.md is comprehensive — covers build/test, directory map, how to add tools, description guidelines, selector syntax, and anti-patterns.
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,131 @@
+# AGENTS.md — Plasmate Codebase Guide
+
+This file is for AI coding agents (Cursor, Devin, Claude Code, Copilot, etc.). It tells you what the codebase does, how it is structured, and how to make changes safely.
+
+## What Plasmate is
+
+Plasmate is a headless browser engine that compiles web pages into a **Semantic Object Model (SOM)** — structured JSON optimised for LLM consumption — instead of returning raw HTML. 17x average token reduction. No API key, no cloud.
+
+It runs as a CLI, a persistent daemon, an MCP server, and a CDP server.
+
+## Build
+
+```bash
+~/.cargo/bin/cargo build              # debug
+~/.cargo/bin/cargo build --release    # release
+```
+
+Requires Rust stable (1.77+). No system dependencies beyond a C linker.
+
+## Test
+
+```bash
+~/.cargo/bin/cargo test               # all tests
+~/.cargo/bin/cargo test som::         # SOM tests only
+~/.cargo/bin/cargo test mcp::         # MCP tests only
+RUST_LOG=debug ~/.cargo/bin/cargo test -- --nocapture  # with logging
+```
+
+There are 224+ tests. All must pass before a PR.
+
+## Key directories
+
+```
+src/
+  main.rs          CLI entry point (fetch, compile, diff, mcp, serve, daemon, screenshot)
+  mcp/
+    mod.rs         MCP server, JSON-RPC router, session manager
+    tools.rs       ALL MCP tool definitions + handlers (add new tools here)
+    sessions.rs    Persistent browser session state
+  som/
+    mod.rs         SOM data structures and serialisation
+    filter.rs      apply_selector() — shared between CLI and MCP
+    compiler.rs    HTML → SOM compiler (the core algorithm)
+  js/
+    runtime.rs     V8-backed JS execution
+    pipeline.rs    Full fetch+JS+compile pipeline
+  network/
+    fetch.rs       HTTP client (reqwest)
+sdk/python/        Python SDK (MCP client)
+sdk/node/          Node.js SDK (MCP client)
+integrations/      LangChain, LlamaIndex, Browser Use, etc.
+packages/          som-parser-python, som-parser-node
+```
+
+## How to add an MCP tool
+
+1. Add a `struct YourToolParams` with `#[derive(Deserialize)]` in `src/mcp/tools.rs`
+2. Write `pub fn your_tool_definition() -> ToolDefinition` with name, description, and input_schema
+3. Write `pub async fn handle_your_tool(arguments: &Value, ...) -> Value` handler
+4. Register both in `src/mcp/mod.rs` — add to `list_tools()` and to the match in `call_tool()`
+5. Add tests in `src/mcp/tools.rs` under `#[cfg(test)]`
+
+Look at `extract_links_definition()` and `handle_extract_links()` for a clean minimal example.
+
+## MCP tool description guidelines
+
+Tool descriptions are read by LLMs (Claude, GPT-4, etc.) to decide which tool to call. Write them as action-oriented instructions, not feature lists:
+
+- State WHAT it returns concretely
+- State WHEN to use it vs alternatives
+- Include any token-saving tips (`selector='main'`)
+- Avoid vague phrases like "token-efficient" without numbers
+
+## SOM selector syntax
+
+`apply_selector(som, sel)` in `src/som/filter.rs` — supported values:
+
+| Selector | Matches |
+|----------|---------|
+| `main` | `<main>` and `role=main` regions |
+| `nav` | Navigation regions |
+| `header` / `footer` | Header / footer regions |
+| `aside` | Sidebar regions |
+| `content` | Article / content regions |
+| `form` | Form regions |
+| `dialog` | Dialog/modal regions |
+| `#foo` | Region with id `foo` |
+
+Returns full SOM if selector matches nothing (graceful fallback).
+
+## Python SDK
+
+Located in `sdk/python/`. Run tests with:
+
+```bash
+cd sdk/python && PYTHONPATH=src python3 -m pytest tests/ -v
+```
+
+The key helper to know: `_extract_last_json(text)` in `client.py` — hardened JSON parser used by both sync and async `_call_tool`. It handles mixed output (progress lines before JSON, embedded JSON in log messages).
+
+## Common patterns
+
+**Error responses (Rust MCP handlers):**
+```rust
+return error_response("descriptive message here");
+```
+
+**Returning SOM as MCP content:**
+```rust
+return tool_response(serde_json::to_string(&result).unwrap_or_default());
+```
+
+**Applying selector before responding:**
+```rust
+let effective_som = if let Some(ref sel) = params.selector {
+    crate::som::filter::apply_selector(&page_result.som, sel)
+} else {
+    page_result.som.clone()
+};
+```
+
+## What NOT to do
+
+- Do not call `reqwest::blocking` from inside a V8 callback or Tokio async context — use `std::thread::spawn` + `mpsc::channel` to escape (see PR #27 for the pattern)
+- Do not add `unwrap()` on network operations — always handle errors and return `error_response()`
+- Do not break the `apply_selector()` contract — it must return full SOM on no-match, never panic
+- Do not change the `--format` or `--selector` CLI flags without updating both `main.rs` and `src/mcp/tools.rs`
+
+## CI
+
+GitHub Actions runs `cargo test` and `cargo clippy` on every PR. Both must pass.
diff --git a/src/mcp/tools.rs b/src/mcp/tools.rs
@@ -67,7 +67,7 @@ struct ExtractTextParams {
 pub fn fetch_page_definition() -> ToolDefinition {
     ToolDefinition {
         name: "fetch_page".to_string(),
-        description: "Fetch a web page and return its Semantic Object Model (SOM) - a structured, token-efficient representation of the page content. Use this instead of raw HTML fetching for 10x token savings.".to_string(),
+        description: "Fetch a web page and return its Semantic Object Model (SOM) - structured JSON with typed regions, interactive elements with stable IDs, and clean text content. Averages 17x fewer tokens than raw HTML (up to 117x on complex pages). Prefer this over raw HTTP fetches for any web content in agent pipelines. Add selector='main' to strip nav/footer and reduce tokens further.".to_string(),
         input_schema: json!({
             "type": "object",
             "properties": {
@@ -97,7 +97,7 @@ pub fn fetch_page_definition() -> ToolDefinition {
 pub fn extract_text_definition() -> ToolDefinition {
     ToolDefinition {
         name: "extract_text".to_string(),
-        description: "Fetch a web page and return only the clean, readable text content. No HTML, no structure - just the text a human would read.".to_string(),
+        description: "Fetch a web page and return only the clean, readable text - no markup, no structure, no element IDs. Use this (instead of fetch_page) when you only need the written content and do not need to interact with the page or reference specific elements.".to_string(),
         input_schema: json!({
             "type": "object",
             "properties": {
@@ -682,7 +682,7 @@ struct ClosePageParams {
 pub fn open_page_definition() -> ToolDefinition {
     ToolDefinition {
         name: "open_page".to_string(),
-        description: "Open a web page in a persistent browser session. Returns a session ID and the initial SOM. Use with click, type, and evaluate for multi-step interactions.".to_string(),
+        description: "Open a URL in a persistent browser session. Returns a session_id and the initial SOM. Use this (instead of fetch_page) when you need to interact with the page - click buttons, fill forms, navigate, or run JavaScript. Pair with click, type_text, navigate_to, and evaluate.".to_string(),
         input_schema: json!({
             "type": "object",
             "properties": {