Here's a section that describes failure layers:
Core Concepts
With these scenarios in mind, these concepts are no longer just jargon:
Capability Gap: The huge gulf between model performance on benchmarks and performance on real tasks. A 50-60% pass rate on SWE-bench Verified means nearly half of real issues can't be resolved.
Harness: Everything outside the model — instructions, tools, environment, state management, verification feedback. If it's not model weights, it's harness. What we've been calling the "saddle."
Harness-Induced Failure: The model has enough capability, but the execution environment has structural defects. Anthropic's controlled experiment already proved this.
Verification Gap: The gap between the agent's confidence in its output and actual correctness. The agent says "I'm done" when it's not done — this is the most common failure mode.
Diagnostic Loop: Execute, observe failure, attribute to a specific harness layer, fix that layer, re-execute. This is the core methodology of harness engineering.
Definition of Done: A set of machine-verifiable conditions — tests pass, lint is clean, type checks pass. Without an explicit definition of done, the agent will invent its own.
Then following this, is the following:
Attribute every failure to a specific layer. Don't just say "the model sucks." Ask: was the task unclear? Was context insufficient? Were there no verification methods? Map each failure to one of the five layers above. Build this habit, and you'll find "the model isn't good enough" appearing less and less in your logs.
There are actually six layers in the Core Concepts. Inconsistency like this undermines trust. Also, it's the kind of error that people use to then say "AI is terrible"; don't give those people ammunition.
Here's a section that describes failure layers:
Then following this, is the following:
There are actually six layers in the Core Concepts. Inconsistency like this undermines trust. Also, it's the kind of error that people use to then say "AI is terrible"; don't give those people ammunition.