-
-
Notifications
You must be signed in to change notification settings - Fork 8
Home
The ast-mcp-server repo is already a solid proof-of-concept: it wraps Tree-sitter parsing and a lightweight ASG builder in a Model Context Protocol (MCP) service, so any Claude/LLM client can pull structural graphs on-demand. The design lines up with current best practice—FastMCP scaffolding for tool exposure, Tree-sitter for incremental ASTs, and simple graph edges that can later feed structure-aware models such as GraphCodeBERT.
- Uses FastMCP so each analysis function is a self-describing “tool,” instantly discoverable by any MCP-aware agent.
- Caches resources by
ast://<hash>andasg://<hash>, which keeps prompts small and encourages re-use across chat turns.
- Tree-sitter is the right call for multi-language, incremental parsing; it’s battle-tested in IDEs and can update on every keystroke.
-
build_parsers.pymakes local C parsers, avoiding binary wheels and keeping install friction low.
- Edges for definitions and references emulate the data-flow heads that boosted GraphCodeBERT accuracy.
- The separation of
parse_code_to_ast→create_asg_from_astmirrors the Semantic Code Graph literature, making it easy to switch to richer schemas later.
- README ships a ready-made Claude desktop config snippet; few open MCP repos do this yet.
| Area | Why it matters | Suggested action |
|---|---|---|
| Packaging & CI | Easier adoption and repeatability | Add a pyproject.toml, publish to TestPyPI, and wire up GitHub Actions for lint + unit tests |
| Edge completeness | Data/control-flow edges are partial; complex scopes may resolve incorrectly | Keep a scope stack while walking the AST; look at the graph-based semantics paper for multi-level token→stmt→graph capture. |
| Performance | Large repos will hit memory when you JSON-dump whole ASTs | Offer a “diff” mode that returns only changed sub-trees; Tree-sitter can give edit ranges natively. |
| Security | The server executes arbitrary user input (parsers) | Sandbox parsing in a restricted process or use seccomp/py-seccomp for Linux targets |
| Testing corpus | Ensures language coverage and guards refactors | Pull tiny fixtures from GitHub’s corpus-manager samples or Rosetta Code and add pytest golden files |
| Docs & examples | Drives contributions | Expand the examples/ folder: show a full round-trip where Claude asks for “refactor functions over 20 LOC” and the tool replies with positions |
-
Graph storage back-end
Persist ASGs in Neo4j or DuckDB and expose a Cypher-query tool so LLMs can answer “Which functions mutate global state?” on large codebases. -
LLM-guided repair
Pipe ASG slices into a fine-tuned GraphCodeBERT or Mistral-7B-Code to generate safe patches automatically; send the diff back through MCP. -
Language-Server (LSP) bridge
Create an LSP that proxies to your MCP server; devs would get semantic diagnostics in VS Code while the same graphs feed the chat agent. -
Streaming mode
Upgrade FastMCP handlers to support server-sent events so the client sees partial ASTs/ASGs as soon as they’re ready—useful for real-time copilots. -
Benchmark harness
Integrate the CodeQL Benchmark suite or Defects4J to track “bugs fixed per minute” as you evolve the ASG heuristics.
For an initial drop, ast-mcp-server nails the essentials: lightweight, language-agnostic parsing exposed through a forward-looking protocol that big tooling vendors (Anthropic, Replit, Sourcegraph) are converging on. Tighten up packaging, flesh out semantic edges, and add CI tests