Home

The ast-mcp-server repo is already a solid proof-of-concept: it wraps Tree-sitter parsing and a lightweight ASG builder in a Model Context Protocol (MCP) service, so any Claude/LLM client can pull structural graphs on-demand. The design lines up with current best practice—FastMCP scaffolding for tool exposure, Tree-sitter for incremental ASTs, and simple graph edges that can later feed structure-aware models such as GraphCodeBERT.

What’s working well

Clean MCP surface

Uses FastMCP so each analysis function is a self-describing “tool,” instantly discoverable by any MCP-aware agent.
Caches resources by ast://<hash> and asg://<hash>, which keeps prompts small and encourages re-use across chat turns.

Pragmatic AST pipeline

Tree-sitter is the right call for multi-language, incremental parsing; it’s battle-tested in IDEs and can update on every keystroke.
build_parsers.py makes local C parsers, avoiding binary wheels and keeping install friction low.

Early—but extensible—ASG builder

Edges for definitions and references emulate the data-flow heads that boosted GraphCodeBERT accuracy.
The separation of parse_code_to_ast → create_asg_from_ast mirrors the Semantic Code Graph literature, making it easy to switch to richer schemas later.

LLM integration friendly

README ships a ready-made Claude desktop config snippet; few open MCP repos do this yet.

Gaps & concrete improvements

Area	Why it matters	Suggested action
Packaging & CI	Easier adoption and repeatability	Add a `pyproject.toml`, publish to TestPyPI, and wire up GitHub Actions for lint + unit tests
Edge completeness	Data/control-flow edges are partial; complex scopes may resolve incorrectly	Keep a scope stack while walking the AST; look at the graph-based semantics paper for multi-level token→stmt→graph capture.
Performance	Large repos will hit memory when you JSON-dump whole ASTs	Offer a “diff” mode that returns only changed sub-trees; Tree-sitter can give edit ranges natively.
Security	The server executes arbitrary user input (parsers)	Sandbox parsing in a restricted process or use `seccomp`/`py-seccomp` for Linux targets
Testing corpus	Ensures language coverage and guards refactors	Pull tiny fixtures from GitHub’s corpus-manager samples or Rosetta Code and add pytest golden files
Docs & examples	Drives contributions	Expand the `examples/` folder: show a full round-trip where Claude asks for “refactor functions over 20 LOC” and the tool replies with positions

Road-map ideas

Graph storage back-end
Persist ASGs in Neo4j or DuckDB and expose a Cypher-query tool so LLMs can answer “Which functions mutate global state?” on large codebases.
LLM-guided repair
Pipe ASG slices into a fine-tuned GraphCodeBERT or Mistral-7B-Code to generate safe patches automatically; send the diff back through MCP.
Language-Server (LSP) bridge
Create an LSP that proxies to your MCP server; devs would get semantic diagnostics in VS Code while the same graphs feed the chat agent.
Streaming mode
Upgrade FastMCP handlers to support server-sent events so the client sees partial ASTs/ASGs as soon as they’re ready—useful for real-time copilots.
Benchmark harness
Integrate the CodeQL Benchmark suite or Defects4J to track “bugs fixed per minute” as you evolve the ASG heuristics.

Verdict

For an initial drop, ast-mcp-server nails the essentials: lightweight, language-agnostic parsing exposed through a forward-looking protocol that big tooling vendors (Anthropic, Replit, Sourcegraph) are converging on. Tighten up packaging, flesh out semantic edges, and add CI tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!