feat(diagnostics): unified 3-layer DiagnosticResult model (#204) by rahuldass19 · Pull Request #206 · QWED-AI/qwed-verification

Rahul Dass (rahuldass19) · 2026-06-18T16:44:54Z

Summary

Establishes the structured verification diagnostic contract from #204 — a unified DiagnosticResult model with three disclosure layers:

Layer	Field	Audience
1 — Agent-Safe	`agent_message: str`	Agents/models (no internals leaked)
2 — Developer	`developer_fields: dict`	Application developers (constraint_id, evidence, advisory_checks)
3 — Proof	`proof_ref: Optional[str]`	Auditors/operators (sha256 hash of retained proof artifact)

Key Design Decisions

Tri-state status only — no proliferation

DiagnosticStatus has exactly 3 values: VERIFIED, UNVERIFIABLE, BLOCKED. No HEURISTIC, AMBIGUOUS, or CORRECTION_NEEDED. Richer distinctions live in developer_fields.constraint_id. This resolves the #190 discussion (Keesan/Rahul debate) — ambiguity IS unverifiability; the distinction is structured, not status-level.

`proof_ref` is the authority bit

No separate authoritative boolean. proof_ref is not None = authoritative (admissible for control flow); proof_ref is None = reject. Downstream gates get a mechanical rule:

if not result.is_authoritative:
    block_decision()

VERIFIED requires proof — structurally enforced

__post_init__ raises ValueError if status == VERIFIED and proof_ref is None. This makes "VERIFIED without proof" impossible to construct — not a caller convention, a type-level invariant.

Advisory checks never touch status

AdvisoryCheck (LLM fallback, NLI entailment, VLM interpretation) populates developer_fields.advisory_checks with advisory_only=True. They never set status or proof_ref. This structurally enforces the #204 constraint: "diagnostics must never originate from model reasoning, confidence, or self-assessment."

Migration helper for existing engines

from_legacy_dict() converts ad-hoc engine dicts to DiagnosticResult for fail-closed states (CORRECTION_NEEDED → UNVERIFIABLE, ERROR → BLOCKED). It raises for legacy VERIFIED results — proof artifacts were discarded by pre-#204 engines, so backfilling is impossible. Callers must use DiagnosticResult.verified() with explicit evidence.

What This PR Does NOT Do

Does NOT refactor existing engines ([Bug]: MathVerifier resolves ambiguous mode inputs heuristically and can return VERIFIED #129, [Bug]: MathVerifier can mark incomplete eigenvalue claims as VERIFIED #130, [Bug]: MathVerifier can return VERIFIED for IRR without convergence proof #131, [Bug]: FactVerifier allows LLM fallback to overwrite deterministic verdicts #133, [Bug]: ImageVerifier allows VLM fallback to decide final verification verdicts #134, [Bug]: Logic verification mutates proof semantics through implicit variable inference and undeclared symbol creation #162, [Bug]: GraphFactVerifier can return VERIFIED from probabilistic or partial support #163, [Bug]: ReasoningVerifier can return valid results without an actual proof or provider path #164, [Bug]: FactVerifier enables non-deterministic LLM fallback in verification path, allowing provider-dependent verdict drift at trust boundary #190, [Tech-Debt]: SecureCodeExecutor fail-closed denial uses (bool, str) tuple — conform to DiagnosticResult model (#204) #205) — those are separate PRs against this contract
Does NOT add explainability features
Does NOT change API/SDK response shapes
Does NOT introduce confidence scores (forbidden by Structured Verification Diagnostics — Architectural Completion of 3-Layer Diagnostic Model #204 constraints)

Files

src/qwed_new/core/diagnostics.py (new) — DiagnosticStatus, DiagnosticResult, AdvisoryCheck, compute_proof_ref
tests/test_diagnostics.py (new) — 68 tests
src/qwed_new/core/__init__.py (modified) — exports
pyproject.toml (modified) — version 5.1.2 → 5.2.0

Test Results

68 new tests pass (status taxonomy, all 3 layers, authority contract, fail-closed enforcement, advisory checks, proof hashing, serialization, legacy migration, realistic scenarios)
Core regression tests pass (97 passed, 21 skipped)
Boundary check passes
Ruff lint clean

Unblocks

#129, #130, #131, #133, #134, #162, #163, #164, #190, #205

Closes #204.

Summary by CodeRabbit

New Features
- Introduced a structured 3-layer verification diagnostics model with Verified, Unverifiable, and Blocked outcomes.
- Added deterministic proof references for Verified results and fail-closed behavior for non-authoritative outcomes.
- Enabled advisory-only metadata attached to diagnostics; made the diagnostics utilities publicly available.
Tests
- Added a comprehensive test suite covering construction rules, deterministic proof generation, serialization, and legacy migration.
Chores
- Bumped the package version to 5.2.0.

Establishes the structured verification diagnostic contract: Layer 1 — agent_message (agent-safe, no internals leaked) Layer 2 — developer_fields (constraint_id, advisory_checks, evidence) Layer 3 — proof_ref (sha256 hash of retained proof artifact) Key design: - DiagnosticStatus: tri-state only (VERIFIED / UNVERIFIABLE / BLOCKED) — no HEURISTIC/AMBIGUOUS proliferation; distinction lives in developer_fields.constraint_id - proof_ref is the authority bit: present = admissible for control flow, None = reject. No separate 'authoritative' boolean needed (resolves #190 Keesan/Rahul debate) - VERIFIED requires proof_ref is not None — structurally enforced in __post_init__, not by caller discipline - AdvisoryCheck: non-proof-bearing analysis (LLM fallback, NLI, VLM) populates developer_fields.advisory_checks, never status or proof_ref - compute_proof_ref: deterministic sha256 hash of JSON-serialized evidence - from_legacy_dict: migration helper for ad-hoc engine dicts (fail-closed states only — raises for legacy VERIFIED since proof artifacts were discarded by pre-#204 engines) Unblocks: #129, #130, #131, #133, #134, #162, #163, #164, #190, #205 Does NOT refactor existing engines — those are separate PRs against this contract. Tests: 68 new tests covering status taxonomy, all 3 layers, authority contract, fail-closed enforcement, advisory checks, proof hashing, serialization round-trip, legacy migration, and realistic scenarios drawn from the blocked issues. Version: 5.1.2 -> 5.2.0 (architectural enhancement)

chatgpt-codex-connector · 2026-06-18T16:45:02Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

coderabbitai · 2026-06-18T16:45:08Z

Warning

Review limit reached

@rahuldass19, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 36 minutes and 28 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 68920db3-9e7a-4ef7-9d8c-cb3975e6dc81

📥 Commits

Reviewing files that changed from the base of the PR and between 75851e0 and 9666285.

📒 Files selected for processing (2)

src/qwed_new/core/diagnostics.py
tests/test_diagnostics.py

📝 Walkthrough

Walkthrough

Introduces src/qwed_new/core/diagnostics.py, a new module implementing a unified 3-layer diagnostic model with DiagnosticStatus (VERIFIED/UNVERIFIABLE/BLOCKED), DiagnosticResult, AdvisoryCheck, and compute_proof_ref. The module is wired into core/__init__.py public exports, covered by 653 lines of new tests, and the package version is bumped to 5.2.0.

Changes

3-Layer Structured Verification Diagnostics

Layer / File(s)	Summary
DiagnosticStatus, AdvisoryCheck, and compute_proof_ref `src/qwed_new/core/diagnostics.py`	Defines the three-state `DiagnosticStatus` string enum (VERIFIED, UNVERIFIABLE, BLOCKED) with proof-authorization semantics per state; `AdvisoryCheck` non-verdict metadata container with `advisory_only=True` enforcement and `to_dict`/`from_dict`; and `compute_proof_ref` which SHA-256-hashes JSON-serialized evidence (with `sort_keys=True`) into a `sha256:`-prefixed proof reference.
DiagnosticResult: core dataclass, invariants, and helper properties `src/qwed_new/core/diagnostics.py`	Implements `DiagnosticResult` with `__post_init__` enforcing that `VERIFIED` requires a non-`None` `proof_ref` and non-verified statuses require `proof_ref=None`; `is_authoritative`, `is_fail_closed`, and `is_verified` authority properties; `constraint_id` and `advisory_checks` field extractors with defensive deserialization; `to_dict()`/`from_dict()` with tolerant string-status parsing and required `agent_message` validation.
DiagnosticResult factories and legacy migration `src/qwed_new/core/diagnostics.py`	Adds `verified(...)`, `unverifiable(...)`, and `blocked(...)` factory classmethods for primary construction; `from_legacy_dict(...)` maps legacy engine dict patterns (status strings, error keys, `is_correct` flags) to fail-closed `UNVERIFIABLE`/`BLOCKED` results with engine-namespaced `constraint_id` values, raising `ValueError` when legacy dict claims `VERIFIED` (proof artifacts not retained).
Public API wiring and package version bump `src/qwed_new/core/__init__.py`, `pyproject.toml`	Imports and re-exports `DiagnosticStatus`, `DiagnosticResult`, `AdvisoryCheck`, `compute_proof_ref` via `__all__`; bumps package version from `5.1.2` to `5.2.0`.
Diagnostic contract and integration tests `tests/test_diagnostics.py`	Validates status membership strictness, `agent_message` enforcement, proof-ref/authority invariants and constructor validation, `is_authoritative`/`is_fail_closed` properties, admission gate behavior, `compute_proof_ref` determinism and key-order independence, `AdvisoryCheck` constraints and serialization, `developer_fields` extraction with defaults, `to_dict`/`from_dict` tolerant parsing and round-trips, legacy migration mapping and error cases, and scenario-based integration tests across realistic blocked-issue examples with fail-closed behavior validation and mechanical admission gate (only authoritative results admitted).

Sequence Diagram(s)

sequenceDiagram
  participant Engine
  participant DiagnosticResult
  participant compute_proof_ref
  participant DownstreamGate

  Engine->>DiagnosticResult: verified(agent_message, proof_evidence, developer_fields)
  DiagnosticResult->>compute_proof_ref: hash(proof_evidence) with sort_keys=True
  compute_proof_ref-->>DiagnosticResult: "sha256:<hex>"
  DiagnosticResult-->>Engine: DiagnosticResult(status=VERIFIED, proof_ref="sha256:...", is_authoritative=True)
  Engine->>DownstreamGate: pass DiagnosticResult
  DownstreamGate->>DownstreamGate: check is_authoritative == True
  DownstreamGate-->>Engine: admitted

  Engine->>DiagnosticResult: from_legacy_dict(legacy_engine_dict)
  DiagnosticResult->>DiagnosticResult: map status/error patterns → UNVERIFIABLE or BLOCKED
  DiagnosticResult-->>Engine: DiagnosticResult(status=UNVERIFIABLE|BLOCKED, proof_ref=None, is_fail_closed=True)
  Engine->>DownstreamGate: pass DiagnosticResult
  DownstreamGate->>DownstreamGate: check is_authoritative == False
  DownstreamGate-->>Engine: rejected (fail-closed)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

#204 (Structured Verification Diagnostics — Architectural Completion of 3-Layer Diagnostic Model): This PR directly implements the unified DiagnosticResult base protocol, the VERIFIED/UNVERIFIABLE/BLOCKED tri-state taxonomy, proof_ref evidence binding via sha256: hashing, constraint_id on failures, agent_message enforcement, and fail-closed invariants described as the target deliverable for v5.2.0 in issue #204.
#205: This is an identified follow-up issue that explicitly depends on this PR's DiagnosticResult and DiagnosticStatus framework to refactor legacy (bool, str) tuple returns in secure_code_executor.py into the new structured diagnostic model; this PR establishes the direct prerequisite for that work.

Poem

🐇 Hoppity-hop through the diagnostic maze,
Three states to sort through the verification haze!
VERIFIED gets its sha256: crown,
BLOCKED and UNVERIFIABLE — fail-closed, don't frown.
Legacy dicts mapped, the old ways replaced —
The proof is in the hash, all neatly laced! 🔐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly references the main change: a unified 3-layer DiagnosticResult model implementing issue `#204`'s diagnostic architecture.
Description check	✅ Passed	The description comprehensively covers all required template sections: Summary (3-layer model explained), Key Design Decisions (tri-state status, proof_ref authority, enforcement, advisory checks, migration), context on what is NOT done, Files changed, Test Results, and Unblocks.
Linked Issues check	✅ Passed	The PR fully implements the core objectives from issue `#204`: unified DiagnosticResult with 3-layer architecture (Agent-Safe, Developer, Proof), tri-state DiagnosticStatus, proof_ref authority enforcement, advisory check isolation, and legacy migration support.
Out of Scope Changes check	✅ Passed	All changes are in-scope: diagnostics.py implements the `#204` contract, tests validate it, init.py exports the new public APIs, and version bump reflects the architectural change. No unrelated refactoring or feature creep.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/diagnostic-result-model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codspeed-hq · 2026-06-18T16:46:27Z

Merging this PR will not alter performance

✅ 20 untouched benchmarks

_{Comparing feat/diagnostic-result-model (9666285) with main (260990f)}

codecov · 2026-06-18T16:50:29Z

Codecov Report

❌ Patch coverage is 99.22481% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/qwed_new/core/diagnostics.py	99.21%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

src/qwed_new/core/diagnostics.py (1)
444-450: ⚖️ Poor tradeoff

Consider raising on truly unrecognized legacy patterns.

This fallback silently returns UNVERIFIABLE for inputs that don't match any known legacy pattern. While UNVERIFIABLE is fail-closed (safe), per QWED_RULES "fail loudly" principle, unrecognized data could indicate unexpected engine behavior that should surface as an error rather than be silently absorbed.

The current approach is pragmatically safe; raising here is more strict and aids debugging when legacy engines produce unexpected output formats.
Optional stricter approach
-        return cls.unverifiable(
-            agent_message="Verification inconclusive — unrecognized legacy result",
-            developer_fields={
-                "constraint_id": f"{engine}.legacy_unrecognized",
-                "legacy_status": legacy_status,
-            },
-        )
+        raise ValueError(
+            f"from_legacy_dict cannot interpret unrecognized legacy data from {engine!r}: "
+            f"status={legacy_status!r}, is_correct={is_correct!r}. "
+            "Review engine output format and add explicit handling."
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/qwed_new/core/diagnostics.py` around lines 444 - 450, The fallback case
that returns cls.unverifiable() for unrecognized legacy patterns should raise an
exception instead to follow the "fail loudly" principle. Replace the
cls.unverifiable() return statement with raising an appropriate exception (such
as ValueError or a domain-specific exception) that includes the legacy_status
and engine information in the error message. This will surface unexpected engine
behavior rather than silently absorbing it as an UNVERIFIABLE result.
Source: Coding guidelines
tests/test_diagnostics.py (1)
504-533: Avoid "deterministic_confidence" naming in test validation metadata.

The test at line 510 uses "deterministic_confidence": 0.4 in developer_fields to validate advisory-only validation scenarios. While the DiagnosticResult status is correctly UNVERIFIABLE (fail-closed), the field naming creates conceptual confusion because "deterministic" implies binary outcome, not probabilistic scoring.

Since this appears only in test data and is placed in developer_fields (which per QWED_RULES holds advisory-only information that does not affect proof authority), the concern is primarily about clarity and preventing misinterpretation by future maintainers rather than a functional bypass.

Consider renaming deterministic_confidence to something that clarifies it represents evidence sufficiency or test context (e.g., evidence_coverage_ratio or test_scenario_confidence_for_advisory_path) to prevent confusion with authority-level confidence scores in production verification.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_diagnostics.py` around lines 504 - 533, In the
test_fact_verifier_llm_advisory_only method, rename the field
"deterministic_confidence" in the developer_fields dictionary to a name that
clarifies it represents advisory test context rather than a deterministic
outcome. Use a name like "evidence_coverage_ratio" or
"test_scenario_confidence_for_advisory_path" to better reflect that this is
advisory-only metadata that does not affect proof authority, preventing
conceptual confusion for future maintainers.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/qwed_new/core/diagnostics.py`:
- Around line 148-155: The json.dumps call in the proof_ref generation logic
uses default=str which causes non-deterministic hashing because str() on
non-serializable objects includes memory addresses that vary between runs,
violating the determinism requirement stated in the docstring. To fix this,
remove the default=str parameter from the json.dumps call so that
non-serializable evidence raises a TypeError instead (fail-closed approach),
which aligns with the documented requirement that callers must pre-convert
non-serializable values before calling this function. Alternatively, if you need
to support non-serializable types, use a deterministic fallback like
default=lambda o: f"<{type(o).__name__}>" instead of default=str.

In `@tests/test_diagnostics.py`:
- Around line 229-236: The tests test_non_serializable_falls_back_to_str and
test_nested_non_serializable_falls_back_to_str currently verify that
non-serializable objects are accepted and produce a hash, but this violates
determinism requirements since str(object()) includes memory addresses that vary
across runs. Update both tests to verify fail-closed behavior by asserting that
compute_proof_ref raises ValueError when passed non-serializable objects, rather
than checking that it returns a sha256 hash. Remove the assertions checking for
the sha256: prefix and instead use self.assertRaises(ValueError) or similar to
verify that the function rejects non-serializable input.

---

Nitpick comments:
In `@src/qwed_new/core/diagnostics.py`:
- Around line 444-450: The fallback case that returns cls.unverifiable() for
unrecognized legacy patterns should raise an exception instead to follow the
"fail loudly" principle. Replace the cls.unverifiable() return statement with
raising an appropriate exception (such as ValueError or a domain-specific
exception) that includes the legacy_status and engine information in the error
message. This will surface unexpected engine behavior rather than silently
absorbing it as an UNVERIFIABLE result.

In `@tests/test_diagnostics.py`:
- Around line 504-533: In the test_fact_verifier_llm_advisory_only method,
rename the field "deterministic_confidence" in the developer_fields dictionary
to a name that clarifies it represents advisory test context rather than a
deterministic outcome. Use a name like "evidence_coverage_ratio" or
"test_scenario_confidence_for_advisory_path" to better reflect that this is
advisory-only metadata that does not affect proof authority, preventing
conceptual confusion for future maintainers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 881c1e98-0a83-4a96-8fc8-7c5357a75c8d

📥 Commits

Reviewing files that changed from the base of the PR and between 260990f and 57ec9d2.

📒 Files selected for processing (4)

pyproject.toml
src/qwed_new/core/__init__.py
src/qwed_new/core/diagnostics.py
tests/test_diagnostics.py

greptile-apps · 2026-06-18T16:56:15Z

Greptile Summary

This PR introduces the unified 3-layer DiagnosticResult model from issue #204: an agent-safe agent_message, structured developer_fields, and a cryptographic proof_ref authority bit. All previously-flagged issues (frozen dataclass, empty proof_ref bypass, advisory_only enforcement, identity-check bypass in from_legacy_dict, advisory_checks raising at access time) have been addressed in this version.

DiagnosticResult is a frozen=True dataclass enforcing that VERIFIED requires a non-empty proof_ref, non-VERIFIED states require proof_ref=None, and agent_message is mandatory — all via __post_init__.
from_legacy_dict correctly uses bool(is_correct) instead of identity checks, handling truthy non-bool legacy values with 13 dedicated tests.
One remaining concern: from_dict accepts any non-empty string as proof_ref without validating the sha256:<64-hex> format, allowing a crafted serialized payload to produce is_authoritative=True with no cryptographic basis.

Confidence Score: 4/5

Safe to merge after addressing the proof_ref format validation gap in from_dict.

All previously-reviewed structural issues have been addressed. The one new finding is that from_dict does not validate the proof_ref string format — any non-empty string produces is_authoritative=True, which is the documented control-flow admission gate. A crafted or corrupted serialized dict could bypass the authority check without a real cryptographic proof reference. The fix is a single regex guard in from_dict.

src/qwed_new/core/diagnostics.py — specifically the from_dict method's handling of the proof_ref field.

Important Files Changed

Filename	Overview
src/qwed_new/core/diagnostics.py	Core new module implementing the 3-layer DiagnosticResult model. All previously-threaded issues addressed. One remaining issue: from_dict accepts any non-empty string as proof_ref with no format validation, allowing a crafted dict to produce is_authoritative=True with no cryptographic basis.
tests/test_diagnostics.py	68 comprehensive tests covering all layers, authority contract, fail-closed enforcement, advisory checks, serialization, legacy migration, and realistic scenarios.
src/qwed_new/core/init.py	Adds public exports for DiagnosticStatus, DiagnosticResult, AdvisoryCheck, and compute_proof_ref. Clean change.
pyproject.toml	Version bump 5.1.2 to 5.2.0, appropriate for a new public API surface.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Caller] -->|DiagnosticResult.verified| B["__post_init__\n(frozen dataclass)"]
    A -->|DiagnosticResult.unverifiable| B
    A -->|DiagnosticResult.blocked| B
    A -->|DiagnosticResult.from_dict| C[from_dict]
    A -->|DiagnosticResult.from_legacy_dict| D[from_legacy_dict]

    C -->|validate agent_message\nparse status| B
    D -->|map legacy status\nfail on VERIFIED| B

    B -->|status==VERIFIED and proof_ref falsy?| E{Raises ValueError}
    B -->|status!=VERIFIED and proof_ref not None?| E
    B -->|agent_message empty?| E
    B -->|all OK| F[DiagnosticResult]

    F -->|is_authoritative| G{proof_ref is not None}
    G -->|True| H[Admit for control flow]
    G -->|False| I[Block decision]

    F -->|advisory_checks property| J[Defensively iterate\nskip invalid items]
    F -->|to_dict| K[Serialized dict\nincludes is_authoritative]

    K -->|from_dict| C

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[Caller] -->|DiagnosticResult.verified| B["__post_init__\n(frozen dataclass)"]
    A -->|DiagnosticResult.unverifiable| B
    A -->|DiagnosticResult.blocked| B
    A -->|DiagnosticResult.from_dict| C[from_dict]
    A -->|DiagnosticResult.from_legacy_dict| D[from_legacy_dict]

    C -->|validate agent_message\nparse status| B
    D -->|map legacy status\nfail on VERIFIED| B

    B -->|status==VERIFIED and proof_ref falsy?| E{Raises ValueError}
    B -->|status!=VERIFIED and proof_ref not None?| E
    B -->|agent_message empty?| E
    B -->|all OK| F[DiagnosticResult]

    F -->|is_authoritative| G{proof_ref is not None}
    G -->|True| H[Admit for control flow]
    G -->|False| I[Block decision]

    F -->|advisory_checks property| J[Defensively iterate\nskip invalid items]
    F -->|to_dict| K[Serialized dict\nincludes is_authoritative]

    K -->|from_dict| C

_{Reviews (7): Last reviewed commit: "fix(diagnostics): empty proof_ref bypass..." | Re-trigger Greptile}

- compute_proof_ref: remove default=str — non-serializable evidence now raises ValueError (fail-closed). Prevents non-deterministic memory- address-dependent hashes from entering proof contract - AdvisoryCheck: add __post_init__ enforcing advisory_only is True — structurally enforced, no longer just docstring claim - from_legacy_dict: replace is True/is False identity checks with bool() truthiness — catches legacy engines using 1/0 instead of True/False (Sentry MEDIUM, Greptile P1) - from_legacy_dict: raise ValueError on unrecognized legacy patterns instead of silent UNVERIFIABLE — fail-loudly per QWED_RULES - from_dict: raise clear ValueError on missing/empty agent_message instead of crashing via __post_init__ with confusing error (Sentry HIGH) - tests: update non-serializable tests to expect ValueError - tests: add AdvisoryCheck(advisory_only=False) raises test - tests: add integer is_correct (1/0) bypass tests - tests: add from_dict missing agent_message raises test - tests: add unrecognized truthy legacy raises test - tests: rename deterministic_confidence -> evidence_coverage_ratio 73 tests pass (up from 68).

…ntry) Early guard was rejecting all truthy is_correct as VERIFIED, making the final unrecognized-pattern raise unreachable dead code. Now only explicit status=='VERIFIED' is rejected early; truthy is_correct with unknown status falls through to the correct unrecognized-pattern raise with the appropriate error message.

coderabbitai

🧹 Nitpick comments (1)

src/qwed_new/core/diagnostics.py (1)

247-254: 💤 Low value

Consider stricter type validation for fail-closed consistency.

The property passes through non-dict items assuming they're already AdvisoryCheck instances. If developer_fields["advisory_checks"] contains malformed data (e.g., [1, "str"]), these would be returned in a list typed as List[AdvisoryCheck], violating the type contract.

Per QWED_RULES fail-closed principle, consider validating or filtering items:

♻️ Suggested refinement

     `@property`
     def advisory_checks(self) -> List[AdvisoryCheck]:
         """Advisory checks from developer_fields, deserialized to AdvisoryCheck."""
         raw = self.developer_fields.get("advisory_checks", [])
         if not isinstance(raw, list):
             return []
-        return [AdvisoryCheck.from_dict(item) if isinstance(item, dict) else item
-                for item in raw]
+        result = []
+        for item in raw:
+            if isinstance(item, dict):
+                result.append(AdvisoryCheck.from_dict(item))
+            elif isinstance(item, AdvisoryCheck):
+                result.append(item)
+            # Skip malformed items (fail-closed: don't propagate garbage)
+        return result

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/qwed_new/core/diagnostics.py` around lines 247 - 254, The advisory_checks
property method contains a type safety issue where non-dict items are passed
through without validation, assuming they are already AdvisoryCheck instances.
This violates the type contract List[AdvisoryCheck] if malformed data (like
integers or strings) exists in developer_fields["advisory_checks"]. To fix this
following the fail-closed principle, modify the list comprehension to strictly
validate items: only include items that are either dicts that can be converted
to AdvisoryCheck via the from_dict method, or items that are already valid
AdvisoryCheck instances. Filter out or skip any items that don't meet these
criteria rather than passing them through unchecked.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/qwed_new/core/diagnostics.py`:
- Around line 247-254: The advisory_checks property method contains a type
safety issue where non-dict items are passed through without validation,
assuming they are already AdvisoryCheck instances. This violates the type
contract List[AdvisoryCheck] if malformed data (like integers or strings) exists
in developer_fields["advisory_checks"]. To fix this following the fail-closed
principle, modify the list comprehension to strictly validate items: only
include items that are either dicts that can be converted to AdvisoryCheck via
the from_dict method, or items that are already valid AdvisoryCheck instances.
Filter out or skip any items that don't meet these criteria rather than passing
them through unchecked.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 57955ff1-b84f-47c2-8e99-b8bb732a0d86

📥 Commits

Reviewing files that changed from the base of the PR and between 57ec9d2 and f73aba9.

📒 Files selected for processing (2)

src/qwed_new/core/diagnostics.py
tests/test_diagnostics.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/test_diagnostics.py

- AdvisoryCheck.from_dict: coerce advisory_only to bool — fixes cross-language APIs that use 1 instead of True (Sentry HIGH) - advisory_checks property: skip malformed/invalid items instead of raising ValueError at access time (Greptile P1, CodeRabbit) - tests: add integer coercion tests, invalid-item skip tests, existing-instance passthrough test 77 tests pass (up from 73).

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/qwed_new/core/diagnostics.py (1)

263-266: ⚡ Quick win

Make the intentional skip explicit in the except block.

The current pass preserves the intended defensive skip, but CodeQL flags the empty handler. A commented continue keeps behavior unchanged and makes the audit intent clear.

Proposed cleanup

                 try:
                     result.append(AdvisoryCheck.from_dict(item))
                 except ValueError:
-                    pass
+                    # Invalid advisory metadata is non-authoritative; skip it.
+                    continue

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/qwed_new/core/diagnostics.py` around lines 263 - 266, The empty except
block handling ValueError in the try-except statement around
AdvisoryCheck.from_dict contains only a pass statement, which CodeQL flags as an
issue. Replace the pass statement with a commented continue statement to make
the intentional skip explicit and audit-intent clear while preserving the
existing behavior of skipping invalid items during the append operation.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/qwed_new/core/diagnostics.py`:
- Line 126: Replace the `bool(data.get("advisory_only", True))` call with
explicit validation that accepts only boolean values or integers 0 and 1,
rejecting all other types including malformed string values like "false" or "0".
Raise a validation error at construction time when the advisory_only field
contains an invalid type or value outside of the accepted boolean/integer range,
rather than silently coercing it with the bool() function.

---

Nitpick comments:
In `@src/qwed_new/core/diagnostics.py`:
- Around line 263-266: The empty except block handling ValueError in the
try-except statement around AdvisoryCheck.from_dict contains only a pass
statement, which CodeQL flags as an issue. Replace the pass statement with a
commented continue statement to make the intentional skip explicit and
audit-intent clear while preserving the existing behavior of skipping invalid
items during the append operation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c708e206-e752-4a8a-a36c-70d61abb1dc5

📥 Commits

Reviewing files that changed from the base of the PR and between f73aba9 and 75851e0.

📒 Files selected for processing (2)

src/qwed_new/core/diagnostics.py
tests/test_diagnostics.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/test_diagnostics.py

…tion - AdvisoryCheck.from_dict: reject malformed strings ('false', '0') — only accept bool or int 0/1, not truthy strings (CodeRabbit MAJOR) - to_dict: serialize AdvisoryCheck instances to dicts — prevents TypeError on JSON serialization (Sentry MEDIUM) - advisory_checks property: explicit comment on except block (CodeQL) - tests: add string rejection test, AdvisoryCheck serialization test, JSON serializability test 80 tests pass.

…try HIGH) - DiagnosticStatus(invalid_str) now caught and re-raised with valid options and from_legacy_dict pointer, instead of unhelpful enum error - tests: add invalid status raises with guidance test 81 tests pass.

… P1) - __post_init__: change proof_ref is None to not proof_ref — catches empty string '' bypassing authority invariant - DiagnosticResult + AdvisoryCheck: add frozen=True — prevents post-construction proof_ref/status mutation bypassing the authority contract without explicit object.__setattr__ - tests: add empty string proof_ref raises test, frozen mutation test 83 tests pass.

sonarqubecloud · 2026-06-18T21:19:43Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
99.2% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

sentry Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread src/qwed_new/core/diagnostics.py Outdated

Comment thread src/qwed_new/core/diagnostics.py Outdated

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread src/qwed_new/core/diagnostics.py

Comment thread tests/test_diagnostics.py Outdated

greptile-apps Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread src/qwed_new/core/diagnostics.py Outdated

Comment thread src/qwed_new/core/diagnostics.py Outdated

Comment thread src/qwed_new/core/diagnostics.py

Comment thread src/qwed_new/core/diagnostics.py

github-advanced-security AI found potential problems Jun 18, 2026

View reviewed changes

Comment thread src/qwed_new/core/diagnostics.py Fixed

github-code-quality Bot found potential problems Jun 18, 2026

View reviewed changes

Comment thread src/qwed_new/core/diagnostics.py Fixed

sentry Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread src/qwed_new/core/diagnostics.py

sentry Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread src/qwed_new/core/diagnostics.py

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread src/qwed_new/core/diagnostics.py Outdated

github-advanced-security AI found potential problems Jun 18, 2026

View reviewed changes

Comment thread src/qwed_new/core/diagnostics.py Fixed

github-code-quality Bot found potential problems Jun 18, 2026

View reviewed changes

Comment thread src/qwed_new/core/diagnostics.py Fixed

sentry Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread src/qwed_new/core/diagnostics.py

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread src/qwed_new/core/diagnostics.py Outdated

sentry Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread src/qwed_new/core/diagnostics.py Outdated

greptile-apps Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread src/qwed_new/core/diagnostics.py Outdated

Comment thread src/qwed_new/core/diagnostics.py Outdated

Rahul Dass (rahuldass19) merged commit 785390a into main Jun 18, 2026
37 checks passed

Rahul Dass (rahuldass19) mentioned this pull request Jun 18, 2026

release: v5.2.0 — Structured Verification Diagnostics #207

Merged

Rahul Dass (rahuldass19) mentioned this pull request Jun 19, 2026

docs: v5.2.0 — Structured Verification Diagnostics QWED-AI/docs#217

Merged

Uh oh!

Conversation

Rahul Dass (rahuldass19) commented Jun 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Design Decisions

Tri-state status only — no proliferation

proof_ref is the authority bit

VERIFIED requires proof — structurally enforced

Advisory checks never touch status

Migration helper for existing engines

What This PR Does NOT Do

Files

Test Results

Unblocks

Summary by CodeRabbit

Uh oh!

chatgpt-codex-connector Bot commented Jun 18, 2026

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

codspeed-hq Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 18, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Rahul Dass (rahuldass19) commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

`proof_ref` is the authority bit

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

codspeed-hq Bot commented Jun 18, 2026 •

edited

Loading

codecov Bot commented Jun 18, 2026 •

edited

Loading

greptile-apps Bot commented Jun 18, 2026 •

edited

Loading