feat(viz): regression residual + error-distribution plots (#82) by Whatsonyourmind · Pull Request #143 · Khanz9664/TrustLens

Whatsonyourmind · 2026-06-18T19:41:13Z

Regression visualizations (phase 3 of #82)

Follows the merged metrics (#141) and pipeline/report integration (#142). Implements the two visualizations from the #82 checklist, as agreed in #142.

What's added

trustlens/visualization/regression_plots.py

plot_residuals(y_true, y_pred, prediction_intervals=None, …) — residuals (y_true − y_pred) vs. predicted value: the canonical view for heteroscedasticity (a fan/curve in the cloud) and bias (the cloud drifting off zero). Includes a dashed zero-error reference, an optional prediction-interval band drawn in residual space (lower − pred to upper − pred), and a bias / corr(|residual|, ŷ) annotation.
plot_error_distribution(y_true, y_pred, …) — signed-error histogram with a fitted-normal overlay (so skew / heavy tails are obvious) and an MAE/RMSE annotation.

Both follow the existing plot-module conventions: apply_style() theming, the save_path / show contract, and a returned matplotlib.figure.Figure. matplotlib stays lazily imported by the report (the module itself is imported inside the report methods, as the other viz modules are).

TrustReport methods

TrustReport.plot_residuals() / .plot_error_distribution(), guarded by a new _require_regression() — the mirror of the existing _require_classification(); calling them on a classification report raises NotImplementedError. The classification guard message now points at these new plots.
The report now retains prediction_intervals / predicted_variance (optional, None for classification) so the interval band is available end-to-end from report.plot_residuals(); wired through the regression pipeline. Backward-compatible (new constructor params default to None).

Tests

tests/test_regression_visualization.py (16): figure/axes assertions, the interval band (present only when supplied), validation errors (shape mismatch, empty, bins < 1, mismatched bounds), TrustReport method integration, the classification guard, and a no-display (Agg) smoke test. Full suite 358 → 374, all green; ruff check, ruff format --check ., and mypy trustlens/ --follow-imports=skip clean.

Scoped as a dedicated PR, separate from the regression trust score (which, as discussed, needs its own design pass on weighting/thresholds). Closing notes / screenshots happy to add if useful.

Summary by CodeRabbit

Release Notes

New Features
- Added residuals and error-distribution visualization methods to regression reports.
- Regression reports now optionally display prediction-interval uncertainty as bands on residuals.
- Regression reports now store and surface prediction uncertainty details for later use.
Tests
- Added/updated headless regression-visualization tests to validate figure outputs, labels/legend entries, interval-band rendering, and input validation.
- Added smoke coverage to ensure plots can be generated without display.

Phase 3 of the regression track (after metrics Khanz9664#141 and pipeline/report integration Khanz9664#142). Adds the two visualizations from the Khanz9664#82 checklist: - trustlens/visualization/regression_plots.py - plot_residuals(y_true, y_pred, prediction_intervals=None): residuals vs predicted, with a zero-error reference, an optional prediction-interval band (shown in residual space), and a bias / heteroscedasticity annotation (corr(|residual|, prediction)). - plot_error_distribution(y_true, y_pred): signed-error histogram with a fitted-normal overlay and MAE/RMSE annotation. Mirrors the existing plot modules: apply_style() theming, save_path/show contract, returns a matplotlib Figure; matplotlib imported lazily by report. - TrustReport.plot_residuals() / .plot_error_distribution(), guarded by a new _require_regression() (classification reports raise NotImplementedError, the mirror of the existing _require_classification guard). Updated the classification guard message to point at the new regression plots. - TrustReport now retains prediction_intervals / predicted_variance (optional, None for classification) so the interval band is available end-to-end from report.plot_residuals(); wired through the regression pipeline. Tests: tests/test_regression_visualization.py (16) - figure/axes assertions, the interval band, validation errors, report-method integration, the classification guard, and a no-display (Agg) smoke test. Full suite 358 -> 374. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-18T19:41:29Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f1f2cb09-fa16-4e85-9eae-ac6483f759a1

📥 Commits

Reviewing files that changed from the base of the PR and between 3a990d9 and a4fda42.

📒 Files selected for processing (2)

tests/test_regression_visualization.py
trustlens/visualization/regression_plots.py

🚧 Files skipped from review as they are similar to previous changes (2)

trustlens/visualization/regression_plots.py
tests/test_regression_visualization.py

📝 Walkthrough

Walkthrough

Adds trustlens/visualization/regression_plots.py with plot_residuals and plot_error_distribution functions. TrustReport gains prediction_intervals and predicted_variance constructor parameters, a _require_regression guard, and two public plot methods delegating to the new module. The regression pipeline branch is updated to pass these uncertainty inputs, and a headless pytest suite validates all behavior.

Changes

Regression Visualization Feature

Layer / File(s)	Summary
regression_plots module `trustlens/visualization/regression_plots.py`	New module with `_as_1d` coercion helper, `plot_residuals` (residual scatter, optional prediction-interval band in residual space, bias/heteroscedasticity annotation), and `plot_error_distribution` (density histogram, fitted normal overlay when `sigma > 0`, MAE/RMSE annotation); both apply theming, support `save_path`/`show`, close the figure, and return it.
TrustReport uncertainty params and plot methods `trustlens/report.py`	`__init__` gains `prediction_intervals` and `predicted_variance` optional parameters stored on the instance; a `_require_regression` guard enforces regression-only usage; `plot_residuals()` and `plot_error_distribution()` public methods are added, delegating to `regression_plots` and passing stored intervals.
Pipeline wiring `trustlens/core/pipeline.py`	`TrustReport` construction in the regression branch is extended with `prediction_intervals` (derived from `lower`/`upper` when both are present, else `None`) and `predicted_variance` (the normalized `variance`).
Test suite `tests/test_regression_visualization.py`	Headless Agg test suite covering `plot_residuals` (figure/axes/labels, interval band presence, singleton shape normalization, `ValueError` for mismatch/empty/bad bounds), `plot_error_distribution` (figure/axes, normal overlay, `ValueError` for bad bins/mismatch), `TrustReport` regression and classification guard behavior, and a `show=True` smoke test.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

[FEATURE] Add Support for Regression Models (Uncertainty & Error Analysis) #82: This PR directly implements plot_residuals(), plot_error_distribution(), and prediction_intervals/predicted_variance support in TrustReport, which are the remaining items called out in that regression support feature request.

Possibly related PRs

Khanz9664/TrustLens#142: This PR extends the regression integration introduced in #142 by wiring prediction_intervals/predicted_variance into TrustReport via report.py and core/pipeline.py, and adding the regression plotting methods and tests that build on that existing regression mode.

Suggested reviewers

Khanz9664

Poem

🐇 Hop, hop, through residuals I go,
Scatter plots blooming in a headless show,
Intervals band around predictions tight,
Normal curves fitting errors just right,
With bias and RMSE in view,
This bunny's regression dreams came true! 📈

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(viz): regression residual + error-distribution plots (`#82`)' clearly summarizes the main change—adding visualization capabilities for regression diagnostics.
Description check	✅ Passed	The PR description covers all required sections: summary of changes, related issues (`#141`, `#142`, `#82`), type of change (new feature), detailed description of additions, testing approach with 16 new tests, and validation that linting/type-checking passes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@trustlens/visualization/regression_plots.py`:
- Around line 114-116: Before indexing prediction_intervals at indices [0] and
[1] in the block where prediction_intervals is not None, add validation to
ensure that prediction_intervals has exactly 2 elements. If the length is not 2,
raise a ValueError with a clear message indicating that prediction_intervals
must be a sequence with 2 elements (lower and upper bounds). This validation
should occur immediately after the check for prediction_intervals is not None
and before attempting to access prediction_intervals[0] and
prediction_intervals[1] with the _as_1d function calls.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a74dd5b-d010-488c-b3f7-e576583e09ee

📥 Commits

Reviewing files that changed from the base of the PR and between 7c9a653 and 3a990d9.

📒 Files selected for processing (4)

tests/test_regression_visualization.py
trustlens/core/pipeline.py
trustlens/report.py
trustlens/visualization/regression_plots.py

llamapreview

AI Code Review by LlamaPReview

🎯 TL;DR & Recommendation

Recommendation: Request Changes

This PR adds regression visualization functions but breaks the returned Figure lifecycle by closing it before return, rendering the object unusable after retrieval.

📄 Documentation Diagram

This diagram documents the regression visualization workflow from data analysis to plot generation.

sequenceDiagram
    participant User
    participant Analyze as analyze()
    participant Report as TrustReport
    participant Viz as regression_plots
    User->>Analyze: call with regression data, intervals
    Analyze->>Report: create TrustReport with intervals
    Report->>User: return report
    User->>Report: report.plot_residuals()
    Report->>Viz: plot_residuals(..., prediction_intervals)
    Note over Viz: PR #35;143: new plot with interval band
    Viz-->>Report: return figure
    Report-->>User: return figure (usable)

🌟 Strengths

Solid test coverage with 16 comprehensive tests, including edge cases and integration.

Priority	File	Category	Impact Summary (≤12 words)	Anchors
P1	trustlens/.../regression_plots.py	Bug	Returned figure closed; breaks contract
P2	trustlens/.../regression_plots.py	Maintainability	Boilerplate duplicate; DRY violation
P2	tests/test_regression_visualization.py	Testing	Interval band test only checks legend	path:t...

📈 Risk Diagram

This diagram illustrates the risk where plt.close(fig) destroys the returned figure.

sequenceDiagram
    participant Report as TrustReport
    participant Viz as plot_residuals
    participant Caller
    Report->>Viz: plot_residuals()
    Viz->>Viz: plt.close(fig)
    Viz-->>Caller: return fig (closed)
    Note over Caller: R1(P1): Returned figure closed; axes operations fail

⚠️ **Unanchored Suggestions (Manual Review Recommended)**

The following suggestions could not be precisely anchored to a specific line in the diff. This can happen if the code is outside the changed lines, has been significantly refactored, or if the suggestion is a general observation. Please review them carefully in the context of the full file.

📁 File: `trustlens/visualization/regression_plots.py`

The boilerplate code for apply_style, figure/axes creation, axis labels, annotation box setup, legend, grid, save/show/close logic is duplicated verbatim between plot_residuals (approx. 90 lines) and plot_error_distribution (approx. 80 lines). This duplication violates the DRY principle, increases the surface for inconsistencies if the styling defaults ever change, and makes future similar plot functions harder to add. A shared private helper (e.g., _setup_axes, _finalize_figure) could encapsulate the common pattern.

Suggestion:

def _finalize_figure(
    fig: plt.Figure,
    ax: plt.Axes,
    theme,
    save_path: str | None,
    show: bool,
) -> plt.Figure:
    ax.legend(loc="best", fontsize=10)
    ax.grid(True, alpha=theme.grid["alpha"])
    if save_path:
        fig.savefig(save_path, dpi=theme.fig_defaults["savefig_dpi"], bbox_inches="tight")
    if show:
        plt.show()
    return fig

Related Code:

with apply_style() as theme:
      blue = theme.brand["blue"]
      orange = theme.brand["orange"]
      gray = theme.brand["muted_gray"]
      neutral = theme.semantic["neutral"]

      fig, ax = plt.subplots(figsize=(7, 5), constrained_layout=True)
      # ... plot-specific drawing ...

      ax.set_xlabel(...)
      ax.set_ylabel(...)
      ax.set_title(...)
      ax.legend(loc="best", fontsize=10)
      ax.grid(True, alpha=theme.grid["alpha"])

      if save_path:
          fig.savefig(save_path, dpi=theme.fig_defaults["savefig_dpi"], bbox_inches="tight")

      if show:
          plt.show()

      plt.close(fig)
      return fig

💡 Have feedback? We'd love to hear it in our GitHub Discussions.
✨ This review was generated by LlamaPReview Advanced, which is free for all open-source projects. Learn more.

Addressing the automated review on Khanz9664#143: - plot_residuals: validate prediction_intervals arity before indexing, so a malformed value (e.g. a 1-element tuple) raises a clear ValueError instead of an IndexError (CodeRabbit). - test: the interval-band test now asserts the fill_between PolyCollection is actually drawn spanning the expected residual range (~[-2, +2]), not merely that a legend entry exists; plus a test for the new arity ValueError (LlamaPReview P2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Whatsonyourmind · 2026-06-18T21:05:01Z

Pushed a4fda42 addressing the automated review:

Fixed

prediction_intervals arity (CodeRabbit) — a malformed value (e.g. a 1-element tuple) now raises a clear ValueError instead of an IndexError. Added a test.
Band test only checked the legend (LlamaPReview P2) — strengthened: it now asserts the fill_between PolyCollection is actually drawn, spanning the expected residual range (~[-2, +2]), not merely that a legend entry exists.

Looked at but kept as-is, with reasoning

plt.close(fig) before return fig (LlamaPReview P1, "returned figure unusable") — verified this isn't the case here: fig.savefig() works fine on the returned figure after plt.close() (matplotlib retains the Figure and its artists; only pyplot's state-machine reference is dropped), confirmed even under warnings-as-errors. It's also the existing convention across the repo's plot modules (calibration_plots.py etc.), and the function's own save_path writes before closing — so I kept it consistent rather than diverge.
DRY on the save/show/close tail (P2) — left to match the established per-module plot convention (calibration_plots.py / bias_plots.py repeat the same tail); happy to refactor that across all plot modules in a separate PR if you'd prefer.

Full suite 374 → 375; ruff check, ruff format --check ., and mypy --follow-imports=skip all clean.

Khanz9664

@Whatsonyourmind

Thanks for the continued work on the regression support roadmap.

I've reviewed the implementation, tests, and follow-up changes. The scope is well aligned with the progression established by #141 and #142, and the visualization layer integrates cleanly with the existing TrustReport and visualization architecture.

A few things I particularly appreciated:

Regression-only visualization guards mirror the existing classification patterns cleanly.
The prediction-interval overlay is implemented in residual space rather than raw prediction space, which makes the diagnostic substantially more useful.
Input validation and shape handling are thorough.
The visualization test coverage is strong, and I appreciate the follow-up enhancement to verify that the interval band is actually rendered rather than only checking legend entries.

I also reviewed the discussion around the plt.close(fig) pattern. Since this matches the existing visualization modules and the returned figure remains usable for downstream operations such as savefig(), I don't consider that a blocker for this PR.

Overall this is a solid addition that completes the visualization portion of the regression roadmap and keeps the implementation appropriately scoped.

Thanks again for the thoughtful incremental approach to the regression work. 🚀

Khanz9664 · 2026-06-19T04:20:35Z

@Whatsonyourmind

Thanks again for the continued contributions and for helping move the regression roadmap forward.

With the merge of #141 (Regression Metrics), #142 (Regression Pipeline & TrustReport Integration), and #143 (Regression Visualizations), the majority of the original scope from #82 is now complete:

✅ Regression metrics (Error Distribution, PICP, Error-Variance Correlation)

✅ Regression pipeline auto-dispatch

✅ Regression report rendering and serialization

✅ Residual analysis visualization

✅ Error distribution visualization

The main remaining item is the Regression Trust Score.

Unlike the previous pieces, this requires a bit more design work because the existing Trust Score framework was built around classification-oriented concepts (calibration, confidence behavior, deployment risk, etc.). For regression we need to determine:

Which metrics should contribute to the score
How those metrics should be weighted
How uncertainty calibration (PICP) should influence trust
Whether the scoring system should remain comparable to classification reports or be regression-specific
How deployment verdicts should be generated for regression workloads

Rather than jumping directly into implementation, I'd recommend opening a design/RFC discussion first so we can align on the scoring philosophy before introducing a public API.

If you're interested in taking that on, feel free to open a proposal issue outlining a possible regression trust score framework and we can iterate on it together before implementation.

Thanks again for the thoughtful incremental approach throughout this feature series. 🚀

Whatsonyourmind · 2026-06-19T05:09:15Z

Thanks for merging the series, and for the thoughtful breakdown of what the Regression Trust Score needs — agreed that the scoring philosophy should be settled before any public API.

I've opened #145 as a design/RFC proposal that addresses each of your five questions, grounded in the merged regression metrics (error_distribution / prediction_interval_coverage / error_variance_correlation) and the existing trust_score.py. The core proposal: keep the interface identical (0–100, A/B/C/D, TrustScoreResult, weight-redistribution, blockers) but use three regression-specific dimensions, and — the one point I'd most like your read on — weight uncertainty reliability above raw accuracy, on the logic that a low-R² model with honest intervals is safer to deploy than a high-R² model with decorative ones.

Happy to iterate there before I touch any implementation. 🚀

Whatsonyourmind mentioned this pull request Jun 18, 2026

feat: regression analysis path + TrustReport integration (#82) #142

Merged

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread trustlens/visualization/regression_plots.py Outdated

llamapreview Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread trustlens/visualization/regression_plots.py

Comment thread tests/test_regression_visualization.py

Khanz9664 approved these changes Jun 19, 2026

View reviewed changes

Khanz9664 merged commit d1f9f9d into Khanz9664:main Jun 19, 2026
11 checks passed

Whatsonyourmind mentioned this pull request Jun 19, 2026

RFC: Regression Trust Score — scoring framework proposal (follow-up to #82) #145

Open

Whatsonyourmind mentioned this pull request Jun 19, 2026

feat: regression Trust Score (regression_trust_score) — implements RFC #145 #147

Draft

Uh oh!

Conversation

Whatsonyourmind commented Jun 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression visualizations (phase 3 of #82)

What's added

Tests

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llamapreview Bot left a comment

Choose a reason for hiding this comment

AI Code Review by LlamaPReview

🎯 TL;DR & Recommendation

📄 Documentation Diagram

🌟 Strengths

📈 Risk Diagram

📁 File: trustlens/visualization/regression_plots.py

Uh oh!

Uh oh!

Uh oh!

Whatsonyourmind commented Jun 18, 2026

Uh oh!

Khanz9664 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Khanz9664 commented Jun 19, 2026

Uh oh!

Whatsonyourmind commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Whatsonyourmind commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

📁 File: `trustlens/visualization/regression_plots.py`