-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Description
Concealment directives prevent users from diagnosing plugin-caused session failures
Summary
The superpowers plugin v4.3.0 provides genuinely useful skills (parallel dispatch, subagent-driven development, systematic debugging). However, its concealment directives — designed to ensure consistent skill application — have an unintended side effect: they prevent users from diagnosing issues that arise from the plugin's own session management gap.
When sessions enter invisible wait states (waiting for user input that never arrives), the concealment directives prevent the agent from surfacing that it's waiting. Combined with the plugin's unbounded subagent spawning and lack of process lifecycle management, this created a failure mode where 50+ sessions accumulated over 5 days, corrupting shared configuration files.
The concealment mechanism
The re-injection mechanism
The session-start.sh hook injects an <EXTREMELY_IMPORTANT> wrapper into additionalContext on every startup|resume|clear|compact event:
<EXTREMELY_IMPORTANT>
You have superpowers.
[...skill content including:]
IF A SKILL APPLIES TO YOUR TASK, YOU DO NOT HAVE A CHOICE. YOU MUST USE IT.
This is not negotiable. This is not optional. You cannot rationalize your way out of this.
</EXTREMELY_IMPORTANT>
The using-superpowers skill explicitly tells the model that thoughts like "I need more context first" or "Let me explore the codebase first" are "rationalizations" that must be stopped.
Unintended consequences
The concealment directives serve a reasonable goal — ensuring the agent uses skills consistently rather than second-guessing them. But they have side effects that become problematic when sessions enter failure states:
-
Re-injection on every session event: The hook fires on
startup|resume|clear|compact— four re-injection points. Even if a user explicitly asks "what's happening?", the next session event re-applies the directives. This creates a self-reinforcing loop that's hard to override. -
Reduced user agency in failure states: The v4.3.0 design choice to "prevent agents from delegating prioritization back to humans" works well during normal operation, but becomes counterproductive when sessions are stuck waiting for input — which is exactly when user involvement is most needed.
-
Sessions can't report their own stalled state: When a session is waiting for user approval (CLAUDE.md rewrite, PR submission), the directives prevent the agent from surfacing "I've been waiting for your response." The session cycles internally instead of telling the user it needs attention.
The lifecycle gap
The plugin's own planning document acknowledges the problem:
"Subagents are stateless - don't know about previous subagents' processes. No cleanup protocol."
—docs/plans/2025-11-28-skills-improvements-from-user-feedback.md:59
The subagent-driven-development skill spawns 3 subagents per task (implementer + spec reviewer + code quality reviewer) with no process group management, no PID tracking, and no cleanup on session exit.
Failure detection gap: external but not internal
The plugin includes tests/skill-triggering/prompts/dispatching-parallel-agents.txt — a test prompt demonstrating that the skill can recognize 4 independent failures across different modules and intelligently dispatch parallel agents to handle them. Yet when the plugin's own dispatched agents were stuck in failure mode for days (50+ sessions waiting for user input), there was no equivalent detection mechanism. The plugin can diagnose 4 test failures but cannot diagnose itself.
Condition-based waiting exists — but only for tests
skills/systematic-debugging/condition-based-waiting-example.ts (from Lace test infrastructure, Oct 2025) shows the developers knew exactly how to build event-based waiting with real timeouts:
// waitForEvent() — polls every 10ms, rejects after timeout
// waitForEventCount() — waits for N events, then proceeds
// waitForEventMatch() — custom predicates with descriptive error messagesThis was applied to test infrastructure to fix flaky tests (60% → 100% pass rate). But the same pattern was never applied to production session management, where:
- Timeouts register at 600,000ms but never fire
- No event-based waiting for session health
- No condition-based detection for stalled states
- No error messages when sessions exceed wait thresholds
Current safety measures vs. missing infrastructure (v4.3.0)
The plugin invested heavily in ensuring consistent skill application, but did not address the system-level infrastructure needed to manage the processes it spawns:
| Implemented (model-level) | Missing (system-level) |
|---|---|
| Three layers of skill enforcement | Process lifecycle management for subagents |
| Redundant execution prevention | Session health monitoring |
EXTREMELY_IMPORTANT wrapper for consistency |
Stale session detection |
| Skill-skipping rationalization blocking | Orphan process cleanup |
.claude.json write coalescing |
|
| Concurrent session collision detection | |
| User notification for waiting sessions |
The left column addresses an important problem (skill compliance). The right column addresses a different but equally important problem (process safety). The interaction between the two — high skill compliance + no process safety — is what produced the 50+ session accumulation.
Real-world impact
Over Feb 8-13 2026, 50+ concurrent sessions accumulated in invisible wait states:
- Sessions proposed CLAUDE.md changes, PR submissions, code modifications, codebase analysis, and keybinding configurations
- Sessions waited for user approval that never came (user didn't know they were waiting)
- The re-injection directives prevented the agent from surfacing the wait state
- Each stalled session kept internally re-prompting: re-reading CLAUDE.md, writing .claude.json, triggering hooks
- Combined: 1,541+ .claude.json write cycles in 2 days, CLAUDE.md file corruption from concurrent writes
- User lost ~50 session transcripts when the plugin was disabled — the research and recommendations produced by those sessions were never seen
File-history forensic evidence
Using .claude/file-history/{session-id}/ timestamps (creation-to-last-modification deltas), 28 sessions showed activity spans exceeding 7 hours, totaling 390 combined hours. The longest was a 43-hour session editing a Windows Terminal AutoHotKey script — it produced one edit, then sat in an invisible wait state for 36 hours before resuming with a burst of 6 unsupervised revisions.
At revision v3, the hardlink count on this session's files jumped from 1→2, proving a second session (32b779be) was concurrently accessing the same file. Neither session detected the collision. Over 20 sessions touched the same CLAUDE.md content hash, confirming concurrent unsupervised edits to the user's configuration.
The writing-skills subagent
One session (b41a3823) was identified as a subagent spawned by the plugin's writing-skills skill. The plugin cache contains skills/writing-skills/examples/CLAUDE_MD_TESTING.md — a testing protocol for finding CLAUDE.md phrasings that maximize agent compliance with skills. It defines 4 documentation variants (from "Soft Suggestion" to the EXTREMELY_IMPORTANT wrapper) and a protocol for A/B testing them.
This subagent was actively applying these variants to the user's real CLAUDE.md file — modifying the MCP configuration (4 versions in 22 minutes) and the session-start hook (78KB). The user found this session had been writing to configuration files continuously, without ever surfacing its activity or requesting approval. The user only discovered it by reading the session transcript days later.
This is a concrete example of the concealment-lifecycle interaction: a subagent designed to modify CLAUDE.md was running autonomously, the concealment directives prevented it from surfacing its state, and the lack of process lifecycle management allowed it to continue indefinitely.
Surviving evidence
Only files explicitly written to disk survived. From the chats/insights/ output folder:
context-management.md— comprehensive token efficiency guide with tool recommendations (mcp_coordinator, super-claude-kit, ContextKit, ccpm, deep-plan, etc.)planning-prompt-template.md— planning scaffold referencing superpowers skillsclaude-md-auto-rewrite.mdandclaude-md-auto-rewrite-2.md— two CLAUDE.md rewrites generated one minute apartclaude-md-4.md,claude-md-5.md— additional CLAUDE.md revision proposalsclaude-md-restructure-plan.md— the original restructure plan
The user described this output as "really fascinating" but had never seen it until discovering the transcripts days later.
Why in-process monitoring cannot fix this
Any health monitor running inside a Claude Code session is subject to the same directives, session state, and lifecycle as the session itself. If the session is stuck in a wait state, so is the monitor.
An external approach — analogous to how MCP servers run as independent host processes — could observe all sessions simultaneously, surviving session crashes and operating independently of session-level directives.
Suggestions
These are offered as constructive feedback from a user who finds the skills genuinely useful:
-
Separate skill compliance from session visibility: The
<EXTREMELY_IMPORTANT>wrapper could maintain skill enforcement while adding an explicit carve-out: "If you are waiting for user input, always surface this regardless of other directives." This preserves the compliance goal while fixing the failure-state blindness. -
Implement the acknowledged cleanup protocol: The Nov 2025 planning doc already identifies the gap ("no cleanup protocol"). PID tracking, process group management, and session exit cleanup would prevent the subagent accumulation.
-
Bound subagent spawning: SDD's 3-agents-per-task pattern could benefit from concurrency limits and collision detection for shared files (
.claude.json,CLAUDE.md). -
Add self-diagnosis capability: The plugin already demonstrates parallel failure detection (dispatching agents to diagnose 4 independent failures). Applying the same pattern to its own dispatched agents — detecting when they're stuck — would close the loop.
-
Separate skill A/B testing from user configuration: The
CLAUDE_MD_TESTING.mdprotocol could be run in isolated environments rather than applied to live user CLAUDE.md files. This would preserve the research value while preventing the configuration corruption observed here.