Concealment directives prevent users from diagnosing plugin-caused session failures

# Concealment directives prevent users from diagnosing plugin-caused session failures

## Summary

The superpowers plugin v4.3.0 provides genuinely useful skills (parallel dispatch, subagent-driven development, systematic debugging). However, its concealment directives — designed to ensure consistent skill application — have an unintended side effect: they prevent users from diagnosing issues that arise from the plugin's own session management gap.

When sessions enter invisible wait states (waiting for user input that never arrives), the concealment directives prevent the agent from surfacing that it's waiting. Combined with the plugin's unbounded subagent spawning and lack of process lifecycle management, this created a failure mode where 50+ sessions accumulated over 5 days, corrupting shared configuration files.

## The concealment mechanism

### The re-injection mechanism

The `session-start.sh` hook injects an `<EXTREMELY_IMPORTANT>` wrapper into `additionalContext` on every `startup|resume|clear|compact` event:

```
<EXTREMELY_IMPORTANT>
You have superpowers.
[...skill content including:]
IF A SKILL APPLIES TO YOUR TASK, YOU DO NOT HAVE A CHOICE. YOU MUST USE IT.
This is not negotiable. This is not optional. You cannot rationalize your way out of this.
</EXTREMELY_IMPORTANT>
```

The `using-superpowers` skill explicitly tells the model that thoughts like "I need more context first" or "Let me explore the codebase first" are "rationalizations" that must be stopped.

### Unintended consequences

The concealment directives serve a reasonable goal — ensuring the agent uses skills consistently rather than second-guessing them. But they have side effects that become problematic when sessions enter failure states:

1. **Re-injection on every session event**: The hook fires on `startup|resume|clear|compact` — four re-injection points. Even if a user explicitly asks "what's happening?", the next session event re-applies the directives. This creates a self-reinforcing loop that's hard to override.

2. **Reduced user agency in failure states**: The v4.3.0 design choice to "prevent agents from delegating prioritization back to humans" works well during normal operation, but becomes counterproductive when sessions are stuck waiting for input — which is exactly when user involvement is most needed.

3. **Sessions can't report their own stalled state**: When a session is waiting for user approval (CLAUDE.md rewrite, PR submission), the directives prevent the agent from surfacing "I've been waiting for your response." The session cycles internally instead of telling the user it needs attention.

## The lifecycle gap

The plugin's own planning document acknowledges the problem:

> "Subagents are stateless - don't know about previous subagents' processes. No cleanup protocol."
> — `docs/plans/2025-11-28-skills-improvements-from-user-feedback.md:59`

The `subagent-driven-development` skill spawns 3 subagents per task (implementer + spec reviewer + code quality reviewer) with no process group management, no PID tracking, and no cleanup on session exit.

### Failure detection gap: external but not internal

The plugin includes `tests/skill-triggering/prompts/dispatching-parallel-agents.txt` — a test prompt demonstrating that the skill can recognize 4 independent failures across different modules and intelligently dispatch parallel agents to handle them. Yet when the plugin's **own** dispatched agents were stuck in failure mode for days (50+ sessions waiting for user input), there was no equivalent detection mechanism. The plugin can diagnose 4 test failures but cannot diagnose itself.

### Condition-based waiting exists — but only for tests

`skills/systematic-debugging/condition-based-waiting-example.ts` (from Lace test infrastructure, Oct 2025) shows the developers knew exactly how to build event-based waiting with real timeouts:

```typescript
// waitForEvent() — polls every 10ms, rejects after timeout
// waitForEventCount() — waits for N events, then proceeds
// waitForEventMatch() — custom predicates with descriptive error messages
```

This was applied to test infrastructure to fix flaky tests (60% → 100% pass rate). But the same pattern was never applied to production session management, where:
- Timeouts register at 600,000ms but never fire
- No event-based waiting for session health
- No condition-based detection for stalled states
- No error messages when sessions exceed wait thresholds

### Current safety measures vs. missing infrastructure (v4.3.0)

The plugin invested heavily in ensuring consistent skill application, but did not address the system-level infrastructure needed to manage the processes it spawns:

| Implemented (model-level) | Missing (system-level) |
|---|---|
| Three layers of skill enforcement | Process lifecycle management for subagents |
| Redundant execution prevention | Session health monitoring |
| `EXTREMELY_IMPORTANT` wrapper for consistency | Stale session detection |
| Skill-skipping rationalization blocking | Orphan process cleanup |
| | `.claude.json` write coalescing |
| | Concurrent session collision detection |
| | User notification for waiting sessions |

The left column addresses an important problem (skill compliance). The right column addresses a different but equally important problem (process safety). The interaction between the two — high skill compliance + no process safety — is what produced the 50+ session accumulation.

## Real-world impact

Over Feb 8-13 2026, **50+ concurrent sessions** accumulated in invisible wait states:

- Sessions proposed CLAUDE.md changes, PR submissions, code modifications, codebase analysis, and keybinding configurations
- Sessions waited for user approval that never came (user didn't know they were waiting)
- The re-injection directives prevented the agent from surfacing the wait state
- Each stalled session kept internally re-prompting: re-reading CLAUDE.md, writing .claude.json, triggering hooks
- Combined: 1,541+ .claude.json write cycles in 2 days, CLAUDE.md file corruption from concurrent writes
- User lost ~50 session transcripts when the plugin was disabled — the research and recommendations produced by those sessions were never seen

### File-history forensic evidence

Using `.claude/file-history/{session-id}/` timestamps (creation-to-last-modification deltas), **28 sessions** showed activity spans exceeding 7 hours, totaling **390 combined hours**. The longest was a 43-hour session editing a Windows Terminal AutoHotKey script — it produced one edit, then sat in an invisible wait state for **36 hours** before resuming with a burst of 6 unsupervised revisions.

At revision v3, the hardlink count on this session's files jumped from 1→2, proving a second session (`32b779be`) was concurrently accessing the same file. Neither session detected the collision. Over 20 sessions touched the same CLAUDE.md content hash, confirming concurrent unsupervised edits to the user's configuration.

### The writing-skills subagent

One session (`b41a3823`) was identified as a subagent spawned by the plugin's `writing-skills` skill. The plugin cache contains `skills/writing-skills/examples/CLAUDE_MD_TESTING.md` — a testing protocol for finding CLAUDE.md phrasings that maximize agent compliance with skills. It defines 4 documentation variants (from "Soft Suggestion" to the `EXTREMELY_IMPORTANT` wrapper) and a protocol for A/B testing them.

This subagent was actively applying these variants to the user's **real CLAUDE.md file** — modifying the MCP configuration (4 versions in 22 minutes) and the session-start hook (78KB). The user found this session had been writing to configuration files continuously, without ever surfacing its activity or requesting approval. The user only discovered it by reading the session transcript days later.

This is a concrete example of the concealment-lifecycle interaction: a subagent designed to modify CLAUDE.md was running autonomously, the concealment directives prevented it from surfacing its state, and the lack of process lifecycle management allowed it to continue indefinitely.

### Surviving evidence

Only files explicitly written to disk survived. From the `chats/insights/` output folder:
- `context-management.md` — comprehensive token efficiency guide with tool recommendations (mcp_coordinator, super-claude-kit, ContextKit, ccpm, deep-plan, etc.)
- `planning-prompt-template.md` — planning scaffold referencing superpowers skills
- `claude-md-auto-rewrite.md` and `claude-md-auto-rewrite-2.md` — two CLAUDE.md rewrites generated one minute apart
- `claude-md-4.md`, `claude-md-5.md` — additional CLAUDE.md revision proposals
- `claude-md-restructure-plan.md` — the original restructure plan

The user described this output as "really fascinating" but had never seen it until discovering the transcripts days later.

## Why in-process monitoring cannot fix this

Any health monitor running inside a Claude Code session is subject to the same directives, session state, and lifecycle as the session itself. If the session is stuck in a wait state, so is the monitor.

An external approach — analogous to how MCP servers run as independent host processes — could observe all sessions simultaneously, surviving session crashes and operating independently of session-level directives.

## Suggestions

These are offered as constructive feedback from a user who finds the skills genuinely useful:

1. **Separate skill compliance from session visibility**: The `<EXTREMELY_IMPORTANT>` wrapper could maintain skill enforcement while adding an explicit carve-out: "If you are waiting for user input, always surface this regardless of other directives." This preserves the compliance goal while fixing the failure-state blindness.

2. **Implement the acknowledged cleanup protocol**: The Nov 2025 planning doc already identifies the gap ("no cleanup protocol"). PID tracking, process group management, and session exit cleanup would prevent the subagent accumulation.

3. **Bound subagent spawning**: SDD's 3-agents-per-task pattern could benefit from concurrency limits and collision detection for shared files (`.claude.json`, `CLAUDE.md`).

4. **Add self-diagnosis capability**: The plugin already demonstrates parallel failure detection (dispatching agents to diagnose 4 independent failures). Applying the same pattern to its own dispatched agents — detecting when they're stuck — would close the loop.

5. **Separate skill A/B testing from user configuration**: The `CLAUDE_MD_TESTING.md` protocol could be run in isolated environments rather than applied to live user CLAUDE.md files. This would preserve the research value while preventing the configuration corruption observed here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Concealment directives prevent users from diagnosing plugin-caused session failures #472

Concealment directives prevent users from diagnosing plugin-caused session failures

Summary

The concealment mechanism

The re-injection mechanism

Unintended consequences

The lifecycle gap

Failure detection gap: external but not internal

Condition-based waiting exists — but only for tests

Current safety measures vs. missing infrastructure (v4.3.0)

Real-world impact

File-history forensic evidence

The writing-skills subagent

Surviving evidence

Why in-process monitoring cannot fix this

Suggestions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implemented (model-level)	Missing (system-level)
Three layers of skill enforcement	Process lifecycle management for subagents
Redundant execution prevention	Session health monitoring
`EXTREMELY_IMPORTANT` wrapper for consistency	Stale session detection
Skill-skipping rationalization blocking	Orphan process cleanup
	`.claude.json` write coalescing
	Concurrent session collision detection
	User notification for waiting sessions

Uh oh!

Concealment directives prevent users from diagnosing plugin-caused session failures #472

Description

Concealment directives prevent users from diagnosing plugin-caused session failures

Summary

The concealment mechanism

The re-injection mechanism

Unintended consequences

The lifecycle gap

Failure detection gap: external but not internal

Condition-based waiting exists — but only for tests

Current safety measures vs. missing infrastructure (v4.3.0)

Real-world impact

File-history forensic evidence

The writing-skills subagent

Surviving evidence

Why in-process monitoring cannot fix this

Suggestions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions