Skip to content

feat(project-intelligence): structured module report (dentro) — outline+signatures+who-uses-this as navigable JSON, read-substitute #245

Description

@apmantza

Summary

Add a structured module report capability so an agent can understand a file before reading its full source. Today the only way to comprehend a module is a raw full-file read, which is token-expensive and the read-guard's least-favourite path. moduleReport projects the data pi-lens already holds (tree-sitter outline + review-graph edges) into a navigable, line-numbered JSON artifact; readSymbol returns a single symbol's body cheaply.

Inspired by the "dentro" sketch (pi-dentro/dentro.md). Built inside pi-lens first (engine seam), MCP-mirrored after — same pattern as pilens_symbol_search / pilens_impact.

Why this shape (decisions already settled)

  • JSON, not markdown. The agent acts on this — every entry must be executable, not prose. md is renderable from the JSON later for humans.
  • Render on demand; do NOT materialize per-file .md during graph build. who-uses-this is an incoming edge owned by other files; a persisted per-file report re-introduces the cross-file staleness perf: hash-based skip of redundant per-edit graph work (content-hash cache freshness + structural-hash short-circuit) #202 killed. The persisted snapshot + 3-tier-cached graph already give cold-start; render from them.
  • Signatures from tree-sitter (syntactic, instant, cold-safe), expand to LSP-typed only on deep/readSymbol for the specific symbol — never whole-file hover fan-out (the Per-server diagnostic deadlines on the with-auxiliary path (don't let a silent primary / slow aux blow the per-edit cap) #242 latency trap).
  • Cache key = fileHash + graph.builtAt (not fileHash alone — usedBy changes when other files change).
  • Guard integrity preserved. An outline is not "having seen the body": moduleReport injects no read records. readSymbol returns real body lines → host records them with lineHashes → that symbol is genuinely covered. The report makes you fast at finding; readSymbol is the cheap thing that shows the body and satisfies the guard.

Contract

moduleReport(path, { depth: "outline"|"standard"|"deep", maxRefsPerSymbol }): ModuleReport
readSymbol(path, symbol): ReadSymbolResult

ModuleSymbolEntry: name, kind, startLine, endLine, exported, signature?, doc?, fanout?, complexity?, flags[], usedBy[{file,symbol,line,relation}], read:{path,offset,limit} (pre-computed — agent's next call sits right there).

ModuleReport: available, staleness, path, language, lineCount, summary{imports,exports,symbols}, imports{external,internal}, api[], internal[], diagnostics[], recommendedReads[] (ranked + executable), semantic{source:"graph-lsp"|"live-lsp"|"none", references, implementations}.

Depth ladder = the latency contract

depth sources live LSP cold-safe
outline single-file tree-sitter extract no yes
standard (default) + review graph (who-uses, flags, diagnostics) no yes
deep + bounded live LSP (typed sigs for API, goToImplementation) yes, ceiling + #242 deadlines n/a

LSP reuse ties into #236

The semantic block is sourced from the #236 LSP-enriched graph (source:"graph-lsp"), not live fan-out. moduleReport is the concrete consumer that makes #236 (merge LSP signals into review graph; goToImplementation first, typeHierarchy follow-on) pay off — references/implementations light up as each #236 slice lands. deep may do bounded live LSP as a fallback.

Findings from the substrate (verified)

  • ProjectSnapshot.symbols is never populated (stays {}) → not a usable outline source. Use on-demand single-file tree-sitter extraction (TreeSitterSymbolExtractor has a typescript query + python/etc.).
  • Extracted Symbol has line/signature/isExported/doc but no endLine → small enabling change: add endLine? and populate from defNode.endPosition (also feeds read-guard enclosingSymbol).
  • Review graph already carries exported, cyclomaticComplexity, isBoundaryWrapper, fanout (computeImpactCascade risk flags are reusable) and incoming edges (edgesByTo).

Phased build order

  1. Enabler: Symbol.endLine + populate in extractor (+ test).
  2. moduleReport outline+standard — pure projection (tree-sitter extract + graph). Ships immediately, proves token savings.
  3. readSymbol — body + lineHashes → guard coverage.
  4. Ranking for recommendedReads (word-index centrality + fanout/complexity + diagnostics).
  5. deep + semantic block — consume feat(project-intelligence): LSP-confirmed edges in the review graph (provenance-stamped, lazy on cascade hot path) #236 graph-LSP; bounded live-LSP fallback.
  6. MCP mirror (pilens_module_report, pilens_read_symbol).

Acceptance (phases 1-3)

  • Symbol.endLine populated for all SYMBOL_QUERIES langs; test.
  • moduleReport(path,{depth:"standard"}) returns navigable JSON with executable read args per symbol, cold (no warm runtime), for a TS and a Python fixture.
  • who-uses-this resolves from review-graph incoming edges; caller line resolved via symbol lookup.
  • readSymbol(path,name) returns exact body lines; host can record it as a read that satisfies the guard for that symbol.
  • Zero live LSP at outline/standard; verified by no LSP spawn in a unit test.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:lspLSP servers & integrationarea:project-intelligenceCodebase model, scanning, debt, rankingarea:read-guardRead-guard & edit substratefeatureNet-new capability: command/tool/runner/integration/config surface that didn't exist

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions