Problem
When /understand --full dispatches parallel file-analyzer subagents, there is no deterministic enforcement of node ID format or complexity enum values. The prompt specifies the correct formats, but the assembly pipeline trusts LLM output without validation — so inconsistent batches silently corrupt the final graph.
Issue 1: No runtime enforcement of node ID format
The file-analyzer prompt (skills/understand/file-analyzer-prompt.md, lines 219–227) specifies:
| Node Type |
Required Format |
Example |
| File |
file:<relative-path> |
file:src/index.ts |
| Function |
func:<relative-path>:<name> |
func:src/utils.ts:formatDate |
| Class |
class:<relative-path>:<name> |
class:src/models/User.ts:User |
However, the Zod schema only validates id: z.string() (packages/core/src/schema.ts, line 13) — any string passes. Neither Phase 3 (ASSEMBLE) nor the GraphBuilder (packages/core/src/analyzer/graph-builder.ts, line 84) validates ID prefix format on merged batch output.
Since subagents are LLMs writing JSON directly to batch-<N>.json files, they can produce:
- Project-name-prefixed IDs:
myproject:backend/main.py
- Double-prefixed IDs:
myproject:service:docker-compose.yml
- Bare paths with no prefix:
frontend/src/utils/constants.ts
Evidence from a 226-file project run:
Issue 2: Complexity enum not validated during assembly
The schema defines complexity: z.enum(["simple", "moderate", "complex"]) (packages/core/src/schema.ts, line 20). The parseFileAnalysisResponse function (packages/core/src/analyzer/llm-analyzer.ts, lines 97–116) normalizes invalid values to "moderate" — but this function is only used in the programmatic GraphBuilder path.
The /understand skill's subagents write batch JSON files directly to disk. Phase 3 merges these files without passing nodes through parseFileAnalysisResponse, so invalid complexity values survive into the assembled graph:
Issue 3: Cascading edge drops at assembly
Phase 3 (skills/understand/SKILL.md, line 155) removes edges whose source or target references a non-existent node ID. When Issue 1 causes ID mismatches between what edges reference and what nodes were actually saved as, valid edges get silently dropped:
Issue 4: All-or-nothing dashboard validation
The dashboard calls validateGraph() on load (packages/dashboard/src/App.tsx, lines 122–130). Since this is Zod safeParse, any invalid node (e.g., complexity: "low") causes the entire graph to fail to load with an error banner. There is no partial load or per-node auto-correction.
This means if even one batch produces a non-enum complexity value and it survives to the final JSON, the dashboard won't render at all.
Impact
On a 226-file project, we observed: nodes with 3 different ID formats across batches, non-standard complexity values ("low", "high", numeric, free-text), and layers with depleted node counts due to ID mismatches during assembly.
Root Cause
The assembly pipeline (Phase 3 in SKILL.md) trusts that all subagent batches produce consistent output. The only format enforcement is:
- Prompt instructions (not deterministic)
- Graph reviewer in Phase 6 (also LLM-based, not deterministic)
- Zod schema at final write/load (catches invalid values but rejects the whole graph rather than fixing them)
There is no deterministic normalization step between batch merge and final write.
Suggested Fixes
-
Add a normalization pass in Phase 3 (ASSEMBLE): After merging batches, deterministically normalize all node IDs to type:path format (strip project-name prefixes, add missing file: prefixes) and map complexity values to the valid enum (low→simple, high→complex, numeric/other→moderate).
-
Validate node ID prefix format in the Zod schema: Change id: z.string() to id: z.string().regex(/^(file|func|class|module|concept):/) so invalid IDs are caught at validation time rather than silently passing.
-
Run parseFileAnalysisResponse-style normalization on batch JSON: Before merging batch files in Phase 3, pass each node through the same complexity normalization that llm-analyzer.ts already implements.
Problem
When
/understand --fulldispatches parallel file-analyzer subagents, there is no deterministic enforcement of node ID format or complexity enum values. The prompt specifies the correct formats, but the assembly pipeline trusts LLM output without validation — so inconsistent batches silently corrupt the final graph.Issue 1: No runtime enforcement of node ID format
The file-analyzer prompt (
skills/understand/file-analyzer-prompt.md, lines 219–227) specifies:file:<relative-path>file:src/index.tsfunc:<relative-path>:<name>func:src/utils.ts:formatDateclass:<relative-path>:<name>class:src/models/User.ts:UserHowever, the Zod schema only validates
id: z.string()(packages/core/src/schema.ts, line 13) — any string passes. Neither Phase 3 (ASSEMBLE) nor theGraphBuilder(packages/core/src/analyzer/graph-builder.ts, line 84) validates ID prefix format on merged batch output.Since subagents are LLMs writing JSON directly to
batch-<N>.jsonfiles, they can produce:myproject:backend/main.pymyproject:service:docker-compose.ymlfrontend/src/utils/constants.tsEvidence from a 226-file project run:
Issue 2: Complexity enum not validated during assembly
The schema defines
complexity: z.enum(["simple", "moderate", "complex"])(packages/core/src/schema.ts, line 20). TheparseFileAnalysisResponsefunction (packages/core/src/analyzer/llm-analyzer.ts, lines 97–116) normalizes invalid values to"moderate"— but this function is only used in the programmaticGraphBuilderpath.The
/understandskill's subagents write batch JSON files directly to disk. Phase 3 merges these files without passing nodes throughparseFileAnalysisResponse, so invalid complexity values survive into the assembled graph:Issue 3: Cascading edge drops at assembly
Phase 3 (
skills/understand/SKILL.md, line 155) removes edges whosesourceortargetreferences a non-existent node ID. When Issue 1 causes ID mismatches between what edges reference and what nodes were actually saved as, valid edges get silently dropped:Issue 4: All-or-nothing dashboard validation
The dashboard calls
validateGraph()on load (packages/dashboard/src/App.tsx, lines 122–130). Since this is ZodsafeParse, any invalid node (e.g.,complexity: "low") causes the entire graph to fail to load with an error banner. There is no partial load or per-node auto-correction.This means if even one batch produces a non-enum complexity value and it survives to the final JSON, the dashboard won't render at all.
Impact
On a 226-file project, we observed: nodes with 3 different ID formats across batches, non-standard complexity values (
"low","high", numeric, free-text), and layers with depleted node counts due to ID mismatches during assembly.Root Cause
The assembly pipeline (Phase 3 in
SKILL.md) trusts that all subagent batches produce consistent output. The only format enforcement is:There is no deterministic normalization step between batch merge and final write.
Suggested Fixes
Add a normalization pass in Phase 3 (ASSEMBLE): After merging batches, deterministically normalize all node IDs to
type:pathformat (strip project-name prefixes, add missingfile:prefixes) and map complexity values to the valid enum (low→simple,high→complex, numeric/other→moderate).Validate node ID prefix format in the Zod schema: Change
id: z.string()toid: z.string().regex(/^(file|func|class|module|concept):/)so invalid IDs are caught at validation time rather than silently passing.Run
parseFileAnalysisResponse-style normalization on batch JSON: Before merging batch files in Phase 3, pass each node through the same complexity normalization thatllm-analyzer.tsalready implements.