Skip to content

feat(diagnostics): unified 3-layer DiagnosticResult model (#204)#206

Merged
Rahul Dass (rahuldass19) merged 7 commits into
mainfrom
feat/diagnostic-result-model
Jun 18, 2026
Merged

feat(diagnostics): unified 3-layer DiagnosticResult model (#204)#206
Rahul Dass (rahuldass19) merged 7 commits into
mainfrom
feat/diagnostic-result-model

Conversation

@rahuldass19

@rahuldass19 Rahul Dass (rahuldass19) commented Jun 18, 2026

Copy link
Copy Markdown
Member

Summary

Establishes the structured verification diagnostic contract from #204 — a unified DiagnosticResult model with three disclosure layers:

Layer Field Audience
1 — Agent-Safe agent_message: str Agents/models (no internals leaked)
2 — Developer developer_fields: dict Application developers (constraint_id, evidence, advisory_checks)
3 — Proof proof_ref: Optional[str] Auditors/operators (sha256 hash of retained proof artifact)

Key Design Decisions

Tri-state status only — no proliferation

DiagnosticStatus has exactly 3 values: VERIFIED, UNVERIFIABLE, BLOCKED. No HEURISTIC, AMBIGUOUS, or CORRECTION_NEEDED. Richer distinctions live in developer_fields.constraint_id. This resolves the #190 discussion (Keesan/Rahul debate) — ambiguity IS unverifiability; the distinction is structured, not status-level.

proof_ref is the authority bit

No separate authoritative boolean. proof_ref is not None = authoritative (admissible for control flow); proof_ref is None = reject. Downstream gates get a mechanical rule:

if not result.is_authoritative:
    block_decision()

VERIFIED requires proof — structurally enforced

__post_init__ raises ValueError if status == VERIFIED and proof_ref is None. This makes "VERIFIED without proof" impossible to construct — not a caller convention, a type-level invariant.

Advisory checks never touch status

AdvisoryCheck (LLM fallback, NLI entailment, VLM interpretation) populates developer_fields.advisory_checks with advisory_only=True. They never set status or proof_ref. This structurally enforces the #204 constraint: "diagnostics must never originate from model reasoning, confidence, or self-assessment."

Migration helper for existing engines

from_legacy_dict() converts ad-hoc engine dicts to DiagnosticResult for fail-closed states (CORRECTION_NEEDEDUNVERIFIABLE, ERRORBLOCKED). It raises for legacy VERIFIED results — proof artifacts were discarded by pre-#204 engines, so backfilling is impossible. Callers must use DiagnosticResult.verified() with explicit evidence.

What This PR Does NOT Do

Files

  • src/qwed_new/core/diagnostics.py (new) — DiagnosticStatus, DiagnosticResult, AdvisoryCheck, compute_proof_ref
  • tests/test_diagnostics.py (new) — 68 tests
  • src/qwed_new/core/__init__.py (modified) — exports
  • pyproject.toml (modified) — version 5.1.2 → 5.2.0

Test Results

  • 68 new tests pass (status taxonomy, all 3 layers, authority contract, fail-closed enforcement, advisory checks, proof hashing, serialization, legacy migration, realistic scenarios)
  • Core regression tests pass (97 passed, 21 skipped)
  • Boundary check passes
  • Ruff lint clean

Unblocks

#129, #130, #131, #133, #134, #162, #163, #164, #190, #205

Closes #204.

Summary by CodeRabbit

  • New Features
    • Introduced a structured 3-layer verification diagnostics model with Verified, Unverifiable, and Blocked outcomes.
    • Added deterministic proof references for Verified results and fail-closed behavior for non-authoritative outcomes.
    • Enabled advisory-only metadata attached to diagnostics; made the diagnostics utilities publicly available.
  • Tests
    • Added a comprehensive test suite covering construction rules, deterministic proof generation, serialization, and legacy migration.
  • Chores
    • Bumped the package version to 5.2.0.

Establishes the structured verification diagnostic contract:

Layer 1 — agent_message (agent-safe, no internals leaked)
Layer 2 — developer_fields (constraint_id, advisory_checks, evidence)
Layer 3 — proof_ref (sha256 hash of retained proof artifact)

Key design:
- DiagnosticStatus: tri-state only (VERIFIED / UNVERIFIABLE / BLOCKED)
  — no HEURISTIC/AMBIGUOUS proliferation; distinction lives in
  developer_fields.constraint_id
- proof_ref is the authority bit: present = admissible for control flow,
  None = reject. No separate 'authoritative' boolean needed (resolves
  #190 Keesan/Rahul debate)
- VERIFIED requires proof_ref is not None — structurally enforced in
  __post_init__, not by caller discipline
- AdvisoryCheck: non-proof-bearing analysis (LLM fallback, NLI, VLM)
  populates developer_fields.advisory_checks, never status or proof_ref
- compute_proof_ref: deterministic sha256 hash of JSON-serialized evidence
- from_legacy_dict: migration helper for ad-hoc engine dicts (fail-closed
  states only — raises for legacy VERIFIED since proof artifacts were
  discarded by pre-#204 engines)

Unblocks: #129, #130, #131, #133, #134, #162, #163, #164, #190, #205
Does NOT refactor existing engines — those are separate PRs against
this contract.

Tests: 68 new tests covering status taxonomy, all 3 layers, authority
contract, fail-closed enforcement, advisory checks, proof hashing,
serialization round-trip, legacy migration, and realistic scenarios
drawn from the blocked issues.

Version: 5.1.2 -> 5.2.0 (architectural enhancement)
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@rahuldass19, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 36 minutes and 28 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 68920db3-9e7a-4ef7-9d8c-cb3975e6dc81

📥 Commits

Reviewing files that changed from the base of the PR and between 75851e0 and 9666285.

📒 Files selected for processing (2)
  • src/qwed_new/core/diagnostics.py
  • tests/test_diagnostics.py
📝 Walkthrough

Walkthrough

Introduces src/qwed_new/core/diagnostics.py, a new module implementing a unified 3-layer diagnostic model with DiagnosticStatus (VERIFIED/UNVERIFIABLE/BLOCKED), DiagnosticResult, AdvisoryCheck, and compute_proof_ref. The module is wired into core/__init__.py public exports, covered by 653 lines of new tests, and the package version is bumped to 5.2.0.

Changes

3-Layer Structured Verification Diagnostics

Layer / File(s) Summary
DiagnosticStatus, AdvisoryCheck, and compute_proof_ref
src/qwed_new/core/diagnostics.py
Defines the three-state DiagnosticStatus string enum (VERIFIED, UNVERIFIABLE, BLOCKED) with proof-authorization semantics per state; AdvisoryCheck non-verdict metadata container with advisory_only=True enforcement and to_dict/from_dict; and compute_proof_ref which SHA-256-hashes JSON-serialized evidence (with sort_keys=True) into a sha256:-prefixed proof reference.
DiagnosticResult: core dataclass, invariants, and helper properties
src/qwed_new/core/diagnostics.py
Implements DiagnosticResult with __post_init__ enforcing that VERIFIED requires a non-None proof_ref and non-verified statuses require proof_ref=None; is_authoritative, is_fail_closed, and is_verified authority properties; constraint_id and advisory_checks field extractors with defensive deserialization; to_dict()/from_dict() with tolerant string-status parsing and required agent_message validation.
DiagnosticResult factories and legacy migration
src/qwed_new/core/diagnostics.py
Adds verified(...), unverifiable(...), and blocked(...) factory classmethods for primary construction; from_legacy_dict(...) maps legacy engine dict patterns (status strings, error keys, is_correct flags) to fail-closed UNVERIFIABLE/BLOCKED results with engine-namespaced constraint_id values, raising ValueError when legacy dict claims VERIFIED (proof artifacts not retained).
Public API wiring and package version bump
src/qwed_new/core/__init__.py, pyproject.toml
Imports and re-exports DiagnosticStatus, DiagnosticResult, AdvisoryCheck, compute_proof_ref via __all__; bumps package version from 5.1.2 to 5.2.0.
Diagnostic contract and integration tests
tests/test_diagnostics.py
Validates status membership strictness, agent_message enforcement, proof-ref/authority invariants and constructor validation, is_authoritative/is_fail_closed properties, admission gate behavior, compute_proof_ref determinism and key-order independence, AdvisoryCheck constraints and serialization, developer_fields extraction with defaults, to_dict/from_dict tolerant parsing and round-trips, legacy migration mapping and error cases, and scenario-based integration tests across realistic blocked-issue examples with fail-closed behavior validation and mechanical admission gate (only authoritative results admitted).

Sequence Diagram(s)

sequenceDiagram
  participant Engine
  participant DiagnosticResult
  participant compute_proof_ref
  participant DownstreamGate

  Engine->>DiagnosticResult: verified(agent_message, proof_evidence, developer_fields)
  DiagnosticResult->>compute_proof_ref: hash(proof_evidence) with sort_keys=True
  compute_proof_ref-->>DiagnosticResult: "sha256:<hex>"
  DiagnosticResult-->>Engine: DiagnosticResult(status=VERIFIED, proof_ref="sha256:...", is_authoritative=True)
  Engine->>DownstreamGate: pass DiagnosticResult
  DownstreamGate->>DownstreamGate: check is_authoritative == True
  DownstreamGate-->>Engine: admitted

  Engine->>DiagnosticResult: from_legacy_dict(legacy_engine_dict)
  DiagnosticResult->>DiagnosticResult: map status/error patterns → UNVERIFIABLE or BLOCKED
  DiagnosticResult-->>Engine: DiagnosticResult(status=UNVERIFIABLE|BLOCKED, proof_ref=None, is_fail_closed=True)
  Engine->>DownstreamGate: pass DiagnosticResult
  DownstreamGate->>DownstreamGate: check is_authoritative == False
  DownstreamGate-->>Engine: rejected (fail-closed)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

  • #204 (Structured Verification Diagnostics — Architectural Completion of 3-Layer Diagnostic Model): This PR directly implements the unified DiagnosticResult base protocol, the VERIFIED/UNVERIFIABLE/BLOCKED tri-state taxonomy, proof_ref evidence binding via sha256: hashing, constraint_id on failures, agent_message enforcement, and fail-closed invariants described as the target deliverable for v5.2.0 in issue #204.
  • #205: This is an identified follow-up issue that explicitly depends on this PR's DiagnosticResult and DiagnosticStatus framework to refactor legacy (bool, str) tuple returns in secure_code_executor.py into the new structured diagnostic model; this PR establishes the direct prerequisite for that work.

Poem

🐇 Hoppity-hop through the diagnostic maze,
Three states to sort through the verification haze!
VERIFIED gets its sha256: crown,
BLOCKED and UNVERIFIABLE — fail-closed, don't frown.
Legacy dicts mapped, the old ways replaced —
The proof is in the hash, all neatly laced! 🔐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly references the main change: a unified 3-layer DiagnosticResult model implementing issue #204's diagnostic architecture.
Description check ✅ Passed The description comprehensively covers all required template sections: Summary (3-layer model explained), Key Design Decisions (tri-state status, proof_ref authority, enforcement, advisory checks, migration), context on what is NOT done, Files changed, Test Results, and Unblocks.
Linked Issues check ✅ Passed The PR fully implements the core objectives from issue #204: unified DiagnosticResult with 3-layer architecture (Agent-Safe, Developer, Proof), tri-state DiagnosticStatus, proof_ref authority enforcement, advisory check isolation, and legacy migration support.
Out of Scope Changes check ✅ Passed All changes are in-scope: diagnostics.py implements the #204 contract, tests validate it, init.py exports the new public APIs, and version bump reflects the architectural change. No unrelated refactoring or feature creep.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/diagnostic-result-model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codspeed-hq

codspeed-hq Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Merging this PR will not alter performance

✅ 20 untouched benchmarks


Comparing feat/diagnostic-result-model (9666285) with main (260990f)

Open in CodSpeed

Comment thread src/qwed_new/core/diagnostics.py Outdated
Comment thread src/qwed_new/core/diagnostics.py Outdated
@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 99.22481% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/qwed_new/core/diagnostics.py 99.21% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/qwed_new/core/diagnostics.py (1)

444-450: ⚖️ Poor tradeoff

Consider raising on truly unrecognized legacy patterns.

This fallback silently returns UNVERIFIABLE for inputs that don't match any known legacy pattern. While UNVERIFIABLE is fail-closed (safe), per QWED_RULES "fail loudly" principle, unrecognized data could indicate unexpected engine behavior that should surface as an error rather than be silently absorbed.

The current approach is pragmatically safe; raising here is more strict and aids debugging when legacy engines produce unexpected output formats.

Optional stricter approach
-        return cls.unverifiable(
-            agent_message="Verification inconclusive — unrecognized legacy result",
-            developer_fields={
-                "constraint_id": f"{engine}.legacy_unrecognized",
-                "legacy_status": legacy_status,
-            },
-        )
+        raise ValueError(
+            f"from_legacy_dict cannot interpret unrecognized legacy data from {engine!r}: "
+            f"status={legacy_status!r}, is_correct={is_correct!r}. "
+            "Review engine output format and add explicit handling."
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/qwed_new/core/diagnostics.py` around lines 444 - 450, The fallback case
that returns cls.unverifiable() for unrecognized legacy patterns should raise an
exception instead to follow the "fail loudly" principle. Replace the
cls.unverifiable() return statement with raising an appropriate exception (such
as ValueError or a domain-specific exception) that includes the legacy_status
and engine information in the error message. This will surface unexpected engine
behavior rather than silently absorbing it as an UNVERIFIABLE result.

Source: Coding guidelines

tests/test_diagnostics.py (1)

504-533: Avoid "deterministic_confidence" naming in test validation metadata.

The test at line 510 uses "deterministic_confidence": 0.4 in developer_fields to validate advisory-only validation scenarios. While the DiagnosticResult status is correctly UNVERIFIABLE (fail-closed), the field naming creates conceptual confusion because "deterministic" implies binary outcome, not probabilistic scoring.

Since this appears only in test data and is placed in developer_fields (which per QWED_RULES holds advisory-only information that does not affect proof authority), the concern is primarily about clarity and preventing misinterpretation by future maintainers rather than a functional bypass.

Consider renaming deterministic_confidence to something that clarifies it represents evidence sufficiency or test context (e.g., evidence_coverage_ratio or test_scenario_confidence_for_advisory_path) to prevent confusion with authority-level confidence scores in production verification.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_diagnostics.py` around lines 504 - 533, In the
test_fact_verifier_llm_advisory_only method, rename the field
"deterministic_confidence" in the developer_fields dictionary to a name that
clarifies it represents advisory test context rather than a deterministic
outcome. Use a name like "evidence_coverage_ratio" or
"test_scenario_confidence_for_advisory_path" to better reflect that this is
advisory-only metadata that does not affect proof authority, preventing
conceptual confusion for future maintainers.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/qwed_new/core/diagnostics.py`:
- Around line 148-155: The json.dumps call in the proof_ref generation logic
uses default=str which causes non-deterministic hashing because str() on
non-serializable objects includes memory addresses that vary between runs,
violating the determinism requirement stated in the docstring. To fix this,
remove the default=str parameter from the json.dumps call so that
non-serializable evidence raises a TypeError instead (fail-closed approach),
which aligns with the documented requirement that callers must pre-convert
non-serializable values before calling this function. Alternatively, if you need
to support non-serializable types, use a deterministic fallback like
default=lambda o: f"<{type(o).__name__}>" instead of default=str.

In `@tests/test_diagnostics.py`:
- Around line 229-236: The tests test_non_serializable_falls_back_to_str and
test_nested_non_serializable_falls_back_to_str currently verify that
non-serializable objects are accepted and produce a hash, but this violates
determinism requirements since str(object()) includes memory addresses that vary
across runs. Update both tests to verify fail-closed behavior by asserting that
compute_proof_ref raises ValueError when passed non-serializable objects, rather
than checking that it returns a sha256 hash. Remove the assertions checking for
the sha256: prefix and instead use self.assertRaises(ValueError) or similar to
verify that the function rejects non-serializable input.

---

Nitpick comments:
In `@src/qwed_new/core/diagnostics.py`:
- Around line 444-450: The fallback case that returns cls.unverifiable() for
unrecognized legacy patterns should raise an exception instead to follow the
"fail loudly" principle. Replace the cls.unverifiable() return statement with
raising an appropriate exception (such as ValueError or a domain-specific
exception) that includes the legacy_status and engine information in the error
message. This will surface unexpected engine behavior rather than silently
absorbing it as an UNVERIFIABLE result.

In `@tests/test_diagnostics.py`:
- Around line 504-533: In the test_fact_verifier_llm_advisory_only method,
rename the field "deterministic_confidence" in the developer_fields dictionary
to a name that clarifies it represents advisory test context rather than a
deterministic outcome. Use a name like "evidence_coverage_ratio" or
"test_scenario_confidence_for_advisory_path" to better reflect that this is
advisory-only metadata that does not affect proof authority, preventing
conceptual confusion for future maintainers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 881c1e98-0a83-4a96-8fc8-7c5357a75c8d

📥 Commits

Reviewing files that changed from the base of the PR and between 260990f and 57ec9d2.

📒 Files selected for processing (4)
  • pyproject.toml
  • src/qwed_new/core/__init__.py
  • src/qwed_new/core/diagnostics.py
  • tests/test_diagnostics.py

Comment thread src/qwed_new/core/diagnostics.py
Comment thread tests/test_diagnostics.py Outdated
@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown

Greptile Summary

This PR introduces the unified 3-layer DiagnosticResult model from issue #204: an agent-safe agent_message, structured developer_fields, and a cryptographic proof_ref authority bit. All previously-flagged issues (frozen dataclass, empty proof_ref bypass, advisory_only enforcement, identity-check bypass in from_legacy_dict, advisory_checks raising at access time) have been addressed in this version.

  • DiagnosticResult is a frozen=True dataclass enforcing that VERIFIED requires a non-empty proof_ref, non-VERIFIED states require proof_ref=None, and agent_message is mandatory — all via __post_init__.
  • from_legacy_dict correctly uses bool(is_correct) instead of identity checks, handling truthy non-bool legacy values with 13 dedicated tests.
  • One remaining concern: from_dict accepts any non-empty string as proof_ref without validating the sha256:<64-hex> format, allowing a crafted serialized payload to produce is_authoritative=True with no cryptographic basis.

Confidence Score: 4/5

Safe to merge after addressing the proof_ref format validation gap in from_dict.

All previously-reviewed structural issues have been addressed. The one new finding is that from_dict does not validate the proof_ref string format — any non-empty string produces is_authoritative=True, which is the documented control-flow admission gate. A crafted or corrupted serialized dict could bypass the authority check without a real cryptographic proof reference. The fix is a single regex guard in from_dict.

src/qwed_new/core/diagnostics.py — specifically the from_dict method's handling of the proof_ref field.

Important Files Changed

Filename Overview
src/qwed_new/core/diagnostics.py Core new module implementing the 3-layer DiagnosticResult model. All previously-threaded issues addressed. One remaining issue: from_dict accepts any non-empty string as proof_ref with no format validation, allowing a crafted dict to produce is_authoritative=True with no cryptographic basis.
tests/test_diagnostics.py 68 comprehensive tests covering all layers, authority contract, fail-closed enforcement, advisory checks, serialization, legacy migration, and realistic scenarios.
src/qwed_new/core/init.py Adds public exports for DiagnosticStatus, DiagnosticResult, AdvisoryCheck, and compute_proof_ref. Clean change.
pyproject.toml Version bump 5.1.2 to 5.2.0, appropriate for a new public API surface.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Caller] -->|DiagnosticResult.verified| B["__post_init__\n(frozen dataclass)"]
    A -->|DiagnosticResult.unverifiable| B
    A -->|DiagnosticResult.blocked| B
    A -->|DiagnosticResult.from_dict| C[from_dict]
    A -->|DiagnosticResult.from_legacy_dict| D[from_legacy_dict]

    C -->|validate agent_message\nparse status| B
    D -->|map legacy status\nfail on VERIFIED| B

    B -->|status==VERIFIED and proof_ref falsy?| E{Raises ValueError}
    B -->|status!=VERIFIED and proof_ref not None?| E
    B -->|agent_message empty?| E
    B -->|all OK| F[DiagnosticResult]

    F -->|is_authoritative| G{proof_ref is not None}
    G -->|True| H[Admit for control flow]
    G -->|False| I[Block decision]

    F -->|advisory_checks property| J[Defensively iterate\nskip invalid items]
    F -->|to_dict| K[Serialized dict\nincludes is_authoritative]

    K -->|from_dict| C
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[Caller] -->|DiagnosticResult.verified| B["__post_init__\n(frozen dataclass)"]
    A -->|DiagnosticResult.unverifiable| B
    A -->|DiagnosticResult.blocked| B
    A -->|DiagnosticResult.from_dict| C[from_dict]
    A -->|DiagnosticResult.from_legacy_dict| D[from_legacy_dict]

    C -->|validate agent_message\nparse status| B
    D -->|map legacy status\nfail on VERIFIED| B

    B -->|status==VERIFIED and proof_ref falsy?| E{Raises ValueError}
    B -->|status!=VERIFIED and proof_ref not None?| E
    B -->|agent_message empty?| E
    B -->|all OK| F[DiagnosticResult]

    F -->|is_authoritative| G{proof_ref is not None}
    G -->|True| H[Admit for control flow]
    G -->|False| I[Block decision]

    F -->|advisory_checks property| J[Defensively iterate\nskip invalid items]
    F -->|to_dict| K[Serialized dict\nincludes is_authoritative]

    K -->|from_dict| C
Loading

Reviews (7): Last reviewed commit: "fix(diagnostics): empty proof_ref bypass..." | Re-trigger Greptile

Comment thread src/qwed_new/core/diagnostics.py Outdated
Comment thread src/qwed_new/core/diagnostics.py Outdated
Comment thread src/qwed_new/core/diagnostics.py
Comment thread src/qwed_new/core/diagnostics.py
- compute_proof_ref: remove default=str — non-serializable evidence now
  raises ValueError (fail-closed). Prevents non-deterministic memory-
  address-dependent hashes from entering proof contract
- AdvisoryCheck: add __post_init__ enforcing advisory_only is True —
  structurally enforced, no longer just docstring claim
- from_legacy_dict: replace is True/is False identity checks with
  bool() truthiness — catches legacy engines using 1/0 instead of
  True/False (Sentry MEDIUM, Greptile P1)
- from_legacy_dict: raise ValueError on unrecognized legacy patterns
  instead of silent UNVERIFIABLE — fail-loudly per QWED_RULES
- from_dict: raise clear ValueError on missing/empty agent_message
  instead of crashing via __post_init__ with confusing error (Sentry HIGH)
- tests: update non-serializable tests to expect ValueError
- tests: add AdvisoryCheck(advisory_only=False) raises test
- tests: add integer is_correct (1/0) bypass tests
- tests: add from_dict missing agent_message raises test
- tests: add unrecognized truthy legacy raises test
- tests: rename deterministic_confidence -> evidence_coverage_ratio

73 tests pass (up from 68).
Comment thread src/qwed_new/core/diagnostics.py Fixed
Comment thread src/qwed_new/core/diagnostics.py Fixed
Comment thread src/qwed_new/core/diagnostics.py
…ntry)

Early guard was rejecting all truthy is_correct as VERIFIED, making
the final unrecognized-pattern raise unreachable dead code. Now only
explicit status=='VERIFIED' is rejected early; truthy is_correct with
unknown status falls through to the correct unrecognized-pattern
raise with the appropriate error message.
Comment thread src/qwed_new/core/diagnostics.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/qwed_new/core/diagnostics.py (1)

247-254: 💤 Low value

Consider stricter type validation for fail-closed consistency.

The property passes through non-dict items assuming they're already AdvisoryCheck instances. If developer_fields["advisory_checks"] contains malformed data (e.g., [1, "str"]), these would be returned in a list typed as List[AdvisoryCheck], violating the type contract.

Per QWED_RULES fail-closed principle, consider validating or filtering items:

♻️ Suggested refinement
     `@property`
     def advisory_checks(self) -> List[AdvisoryCheck]:
         """Advisory checks from developer_fields, deserialized to AdvisoryCheck."""
         raw = self.developer_fields.get("advisory_checks", [])
         if not isinstance(raw, list):
             return []
-        return [AdvisoryCheck.from_dict(item) if isinstance(item, dict) else item
-                for item in raw]
+        result = []
+        for item in raw:
+            if isinstance(item, dict):
+                result.append(AdvisoryCheck.from_dict(item))
+            elif isinstance(item, AdvisoryCheck):
+                result.append(item)
+            # Skip malformed items (fail-closed: don't propagate garbage)
+        return result
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/qwed_new/core/diagnostics.py` around lines 247 - 254, The advisory_checks
property method contains a type safety issue where non-dict items are passed
through without validation, assuming they are already AdvisoryCheck instances.
This violates the type contract List[AdvisoryCheck] if malformed data (like
integers or strings) exists in developer_fields["advisory_checks"]. To fix this
following the fail-closed principle, modify the list comprehension to strictly
validate items: only include items that are either dicts that can be converted
to AdvisoryCheck via the from_dict method, or items that are already valid
AdvisoryCheck instances. Filter out or skip any items that don't meet these
criteria rather than passing them through unchecked.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/qwed_new/core/diagnostics.py`:
- Around line 247-254: The advisory_checks property method contains a type
safety issue where non-dict items are passed through without validation,
assuming they are already AdvisoryCheck instances. This violates the type
contract List[AdvisoryCheck] if malformed data (like integers or strings) exists
in developer_fields["advisory_checks"]. To fix this following the fail-closed
principle, modify the list comprehension to strictly validate items: only
include items that are either dicts that can be converted to AdvisoryCheck via
the from_dict method, or items that are already valid AdvisoryCheck instances.
Filter out or skip any items that don't meet these criteria rather than passing
them through unchecked.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 57955ff1-b84f-47c2-8e99-b8bb732a0d86

📥 Commits

Reviewing files that changed from the base of the PR and between 57ec9d2 and f73aba9.

📒 Files selected for processing (2)
  • src/qwed_new/core/diagnostics.py
  • tests/test_diagnostics.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_diagnostics.py

Comment thread src/qwed_new/core/diagnostics.py Outdated
- AdvisoryCheck.from_dict: coerce advisory_only to bool — fixes
  cross-language APIs that use 1 instead of True (Sentry HIGH)
- advisory_checks property: skip malformed/invalid items instead of
  raising ValueError at access time (Greptile P1, CodeRabbit)
- tests: add integer coercion tests, invalid-item skip tests,
  existing-instance passthrough test

77 tests pass (up from 73).
Comment thread src/qwed_new/core/diagnostics.py Fixed
Comment thread src/qwed_new/core/diagnostics.py Fixed
Comment thread src/qwed_new/core/diagnostics.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/qwed_new/core/diagnostics.py (1)

263-266: ⚡ Quick win

Make the intentional skip explicit in the except block.

The current pass preserves the intended defensive skip, but CodeQL flags the empty handler. A commented continue keeps behavior unchanged and makes the audit intent clear.

Proposed cleanup
                 try:
                     result.append(AdvisoryCheck.from_dict(item))
                 except ValueError:
-                    pass
+                    # Invalid advisory metadata is non-authoritative; skip it.
+                    continue
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/qwed_new/core/diagnostics.py` around lines 263 - 266, The empty except
block handling ValueError in the try-except statement around
AdvisoryCheck.from_dict contains only a pass statement, which CodeQL flags as an
issue. Replace the pass statement with a commented continue statement to make
the intentional skip explicit and audit-intent clear while preserving the
existing behavior of skipping invalid items during the append operation.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/qwed_new/core/diagnostics.py`:
- Line 126: Replace the `bool(data.get("advisory_only", True))` call with
explicit validation that accepts only boolean values or integers 0 and 1,
rejecting all other types including malformed string values like "false" or "0".
Raise a validation error at construction time when the advisory_only field
contains an invalid type or value outside of the accepted boolean/integer range,
rather than silently coercing it with the bool() function.

---

Nitpick comments:
In `@src/qwed_new/core/diagnostics.py`:
- Around line 263-266: The empty except block handling ValueError in the
try-except statement around AdvisoryCheck.from_dict contains only a pass
statement, which CodeQL flags as an issue. Replace the pass statement with a
commented continue statement to make the intentional skip explicit and
audit-intent clear while preserving the existing behavior of skipping invalid
items during the append operation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c708e206-e752-4a8a-a36c-70d61abb1dc5

📥 Commits

Reviewing files that changed from the base of the PR and between f73aba9 and 75851e0.

📒 Files selected for processing (2)
  • src/qwed_new/core/diagnostics.py
  • tests/test_diagnostics.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_diagnostics.py

Comment thread src/qwed_new/core/diagnostics.py Outdated
…tion

- AdvisoryCheck.from_dict: reject malformed strings ('false', '0') —
  only accept bool or int 0/1, not truthy strings (CodeRabbit MAJOR)
- to_dict: serialize AdvisoryCheck instances to dicts — prevents
  TypeError on JSON serialization (Sentry MEDIUM)
- advisory_checks property: explicit comment on except block (CodeQL)
- tests: add string rejection test, AdvisoryCheck serialization test,
  JSON serializability test

80 tests pass.
Comment thread src/qwed_new/core/diagnostics.py Outdated
…try HIGH)

- DiagnosticStatus(invalid_str) now caught and re-raised with valid
  options and from_legacy_dict pointer, instead of unhelpful enum error
- tests: add invalid status raises with guidance test

81 tests pass.
Comment thread src/qwed_new/core/diagnostics.py Outdated
Comment thread src/qwed_new/core/diagnostics.py Outdated
… P1)

- __post_init__: change proof_ref is None to not proof_ref —
  catches empty string '' bypassing authority invariant
- DiagnosticResult + AdvisoryCheck: add frozen=True — prevents
  post-construction proof_ref/status mutation bypassing the
  authority contract without explicit object.__setattr__
- tests: add empty string proof_ref raises test, frozen mutation test

83 tests pass.
@sonarqubecloud

Copy link
Copy Markdown

@rahuldass19 Rahul Dass (rahuldass19) merged commit 785390a into main Jun 18, 2026
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Structured Verification Diagnostics — Architectural Completion of 3-Layer Diagnostic Model

2 participants