Skip to content

feat(viz): regression residual + error-distribution plots (#82)#143

Merged
Khanz9664 merged 2 commits into
Khanz9664:mainfrom
Whatsonyourmind:feat/regression-viz
Jun 19, 2026
Merged

feat(viz): regression residual + error-distribution plots (#82)#143
Khanz9664 merged 2 commits into
Khanz9664:mainfrom
Whatsonyourmind:feat/regression-viz

Conversation

@Whatsonyourmind

@Whatsonyourmind Whatsonyourmind commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Regression visualizations (phase 3 of #82)

Follows the merged metrics (#141) and pipeline/report integration (#142). Implements the two visualizations from the #82 checklist, as agreed in #142.

What's added

trustlens/visualization/regression_plots.py

  • plot_residuals(y_true, y_pred, prediction_intervals=None, …) — residuals (y_true − y_pred) vs. predicted value: the canonical view for heteroscedasticity (a fan/curve in the cloud) and bias (the cloud drifting off zero). Includes a dashed zero-error reference, an optional prediction-interval band drawn in residual space (lower − pred to upper − pred), and a bias / corr(|residual|, ŷ) annotation.
  • plot_error_distribution(y_true, y_pred, …) — signed-error histogram with a fitted-normal overlay (so skew / heavy tails are obvious) and an MAE/RMSE annotation.

Both follow the existing plot-module conventions: apply_style() theming, the save_path / show contract, and a returned matplotlib.figure.Figure. matplotlib stays lazily imported by the report (the module itself is imported inside the report methods, as the other viz modules are).

TrustReport methods

  • TrustReport.plot_residuals() / .plot_error_distribution(), guarded by a new _require_regression() — the mirror of the existing _require_classification(); calling them on a classification report raises NotImplementedError. The classification guard message now points at these new plots.
  • The report now retains prediction_intervals / predicted_variance (optional, None for classification) so the interval band is available end-to-end from report.plot_residuals(); wired through the regression pipeline. Backward-compatible (new constructor params default to None).

Tests

tests/test_regression_visualization.py (16): figure/axes assertions, the interval band (present only when supplied), validation errors (shape mismatch, empty, bins < 1, mismatched bounds), TrustReport method integration, the classification guard, and a no-display (Agg) smoke test. Full suite 358 → 374, all green; ruff check, ruff format --check ., and mypy trustlens/ --follow-imports=skip clean.

Scoped as a dedicated PR, separate from the regression trust score (which, as discussed, needs its own design pass on weighting/thresholds). Closing notes / screenshots happy to add if useful.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added residuals and error-distribution visualization methods to regression reports.
    • Regression reports now optionally display prediction-interval uncertainty as bands on residuals.
    • Regression reports now store and surface prediction uncertainty details for later use.
  • Tests

    • Added/updated headless regression-visualization tests to validate figure outputs, labels/legend entries, interval-band rendering, and input validation.
    • Added smoke coverage to ensure plots can be generated without display.

Phase 3 of the regression track (after metrics Khanz9664#141 and pipeline/report
integration Khanz9664#142). Adds the two visualizations from the Khanz9664#82 checklist:

- trustlens/visualization/regression_plots.py
  - plot_residuals(y_true, y_pred, prediction_intervals=None): residuals vs
    predicted, with a zero-error reference, an optional prediction-interval
    band (shown in residual space), and a bias / heteroscedasticity annotation
    (corr(|residual|, prediction)).
  - plot_error_distribution(y_true, y_pred): signed-error histogram with a
    fitted-normal overlay and MAE/RMSE annotation.
  Mirrors the existing plot modules: apply_style() theming, save_path/show
  contract, returns a matplotlib Figure; matplotlib imported lazily by report.

- TrustReport.plot_residuals() / .plot_error_distribution(), guarded by a new
  _require_regression() (classification reports raise NotImplementedError, the
  mirror of the existing _require_classification guard). Updated the
  classification guard message to point at the new regression plots.

- TrustReport now retains prediction_intervals / predicted_variance (optional,
  None for classification) so the interval band is available end-to-end from
  report.plot_residuals(); wired through the regression pipeline.

Tests: tests/test_regression_visualization.py (16) - figure/axes assertions,
the interval band, validation errors, report-method integration, the
classification guard, and a no-display (Agg) smoke test. Full suite 358 -> 374.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f1f2cb09-fa16-4e85-9eae-ac6483f759a1

📥 Commits

Reviewing files that changed from the base of the PR and between 3a990d9 and a4fda42.

📒 Files selected for processing (2)
  • tests/test_regression_visualization.py
  • trustlens/visualization/regression_plots.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • trustlens/visualization/regression_plots.py
  • tests/test_regression_visualization.py

📝 Walkthrough

Walkthrough

Adds trustlens/visualization/regression_plots.py with plot_residuals and plot_error_distribution functions. TrustReport gains prediction_intervals and predicted_variance constructor parameters, a _require_regression guard, and two public plot methods delegating to the new module. The regression pipeline branch is updated to pass these uncertainty inputs, and a headless pytest suite validates all behavior.

Changes

Regression Visualization Feature

Layer / File(s) Summary
regression_plots module
trustlens/visualization/regression_plots.py
New module with _as_1d coercion helper, plot_residuals (residual scatter, optional prediction-interval band in residual space, bias/heteroscedasticity annotation), and plot_error_distribution (density histogram, fitted normal overlay when sigma > 0, MAE/RMSE annotation); both apply theming, support save_path/show, close the figure, and return it.
TrustReport uncertainty params and plot methods
trustlens/report.py
__init__ gains prediction_intervals and predicted_variance optional parameters stored on the instance; a _require_regression guard enforces regression-only usage; plot_residuals() and plot_error_distribution() public methods are added, delegating to regression_plots and passing stored intervals.
Pipeline wiring
trustlens/core/pipeline.py
TrustReport construction in the regression branch is extended with prediction_intervals (derived from lower/upper when both are present, else None) and predicted_variance (the normalized variance).
Test suite
tests/test_regression_visualization.py
Headless Agg test suite covering plot_residuals (figure/axes/labels, interval band presence, singleton shape normalization, ValueError for mismatch/empty/bad bounds), plot_error_distribution (figure/axes, normal overlay, ValueError for bad bins/mismatch), TrustReport regression and classification guard behavior, and a show=True smoke test.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

  • Khanz9664/TrustLens#142: This PR extends the regression integration introduced in #142 by wiring prediction_intervals/predicted_variance into TrustReport via report.py and core/pipeline.py, and adding the regression plotting methods and tests that build on that existing regression mode.

Suggested reviewers

  • Khanz9664

Poem

🐇 Hop, hop, through residuals I go,
Scatter plots blooming in a headless show,
Intervals band around predictions tight,
Normal curves fitting errors just right,
With bias and RMSE in view,
This bunny's regression dreams came true! 📈

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(viz): regression residual + error-distribution plots (#82)' clearly summarizes the main change—adding visualization capabilities for regression diagnostics.
Description check ✅ Passed The PR description covers all required sections: summary of changes, related issues (#141, #142, #82), type of change (new feature), detailed description of additions, testing approach with 16 new tests, and validation that linting/type-checking passes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@trustlens/visualization/regression_plots.py`:
- Around line 114-116: Before indexing prediction_intervals at indices [0] and
[1] in the block where prediction_intervals is not None, add validation to
ensure that prediction_intervals has exactly 2 elements. If the length is not 2,
raise a ValueError with a clear message indicating that prediction_intervals
must be a sequence with 2 elements (lower and upper bounds). This validation
should occur immediately after the check for prediction_intervals is not None
and before attempting to access prediction_intervals[0] and
prediction_intervals[1] with the _as_1d function calls.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a74dd5b-d010-488c-b3f7-e576583e09ee

📥 Commits

Reviewing files that changed from the base of the PR and between 7c9a653 and 3a990d9.

📒 Files selected for processing (4)
  • tests/test_regression_visualization.py
  • trustlens/core/pipeline.py
  • trustlens/report.py
  • trustlens/visualization/regression_plots.py

Comment thread trustlens/visualization/regression_plots.py Outdated

@llamapreview llamapreview Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review by LlamaPReview

🎯 TL;DR & Recommendation

Recommendation: Request Changes

This PR adds regression visualization functions but breaks the returned Figure lifecycle by closing it before return, rendering the object unusable after retrieval.

📄 Documentation Diagram

This diagram documents the regression visualization workflow from data analysis to plot generation.

sequenceDiagram
    participant User
    participant Analyze as analyze()
    participant Report as TrustReport
    participant Viz as regression_plots
    User->>Analyze: call with regression data, intervals
    Analyze->>Report: create TrustReport with intervals
    Report->>User: return report
    User->>Report: report.plot_residuals()
    Report->>Viz: plot_residuals(..., prediction_intervals)
    Note over Viz: PR #35;143: new plot with interval band
    Viz-->>Report: return figure
    Report-->>User: return figure (usable)
Loading

🌟 Strengths

  • Solid test coverage with 16 comprehensive tests, including edge cases and integration.
Priority File Category Impact Summary (≤12 words) Anchors
P1 trustlens/.../regression_plots.py Bug Returned figure closed; breaks contract
P2 trustlens/.../regression_plots.py Maintainability Boilerplate duplicate; DRY violation
P2 tests/test_regression_visualization.py Testing Interval band test only checks legend path:t...

📈 Risk Diagram

This diagram illustrates the risk where plt.close(fig) destroys the returned figure.

sequenceDiagram
    participant Report as TrustReport
    participant Viz as plot_residuals
    participant Caller
    Report->>Viz: plot_residuals()
    Viz->>Viz: plt.close(fig)
    Viz-->>Caller: return fig (closed)
    Note over Caller: R1(P1): Returned figure closed; axes operations fail
Loading
⚠️ **Unanchored Suggestions (Manual Review Recommended)**

The following suggestions could not be precisely anchored to a specific line in the diff. This can happen if the code is outside the changed lines, has been significantly refactored, or if the suggestion is a general observation. Please review them carefully in the context of the full file.


📁 File: trustlens/visualization/regression_plots.py

The boilerplate code for apply_style, figure/axes creation, axis labels, annotation box setup, legend, grid, save/show/close logic is duplicated verbatim between plot_residuals (approx. 90 lines) and plot_error_distribution (approx. 80 lines). This duplication violates the DRY principle, increases the surface for inconsistencies if the styling defaults ever change, and makes future similar plot functions harder to add. A shared private helper (e.g., _setup_axes, _finalize_figure) could encapsulate the common pattern.

Suggestion:

def _finalize_figure(
    fig: plt.Figure,
    ax: plt.Axes,
    theme,
    save_path: str | None,
    show: bool,
) -> plt.Figure:
    ax.legend(loc="best", fontsize=10)
    ax.grid(True, alpha=theme.grid["alpha"])
    if save_path:
        fig.savefig(save_path, dpi=theme.fig_defaults["savefig_dpi"], bbox_inches="tight")
    if show:
        plt.show()
    return fig

Related Code:

with apply_style() as theme:
      blue = theme.brand["blue"]
      orange = theme.brand["orange"]
      gray = theme.brand["muted_gray"]
      neutral = theme.semantic["neutral"]

      fig, ax = plt.subplots(figsize=(7, 5), constrained_layout=True)
      # ... plot-specific drawing ...

      ax.set_xlabel(...)
      ax.set_ylabel(...)
      ax.set_title(...)
      ax.legend(loc="best", fontsize=10)
      ax.grid(True, alpha=theme.grid["alpha"])

      if save_path:
          fig.savefig(save_path, dpi=theme.fig_defaults["savefig_dpi"], bbox_inches="tight")

      if show:
          plt.show()

      plt.close(fig)
      return fig


💡 Have feedback? We'd love to hear it in our GitHub Discussions.
✨ This review was generated by LlamaPReview Advanced, which is free for all open-source projects. Learn more.

Comment thread trustlens/visualization/regression_plots.py
Comment thread tests/test_regression_visualization.py
Addressing the automated review on Khanz9664#143:
- plot_residuals: validate prediction_intervals arity before indexing, so a
  malformed value (e.g. a 1-element tuple) raises a clear ValueError instead of
  an IndexError (CodeRabbit).
- test: the interval-band test now asserts the fill_between PolyCollection is
  actually drawn spanning the expected residual range (~[-2, +2]), not merely
  that a legend entry exists; plus a test for the new arity ValueError
  (LlamaPReview P2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Whatsonyourmind

Copy link
Copy Markdown
Contributor Author

Pushed a4fda42 addressing the automated review:

Fixed

  • prediction_intervals arity (CodeRabbit) — a malformed value (e.g. a 1-element tuple) now raises a clear ValueError instead of an IndexError. Added a test.
  • Band test only checked the legend (LlamaPReview P2) — strengthened: it now asserts the fill_between PolyCollection is actually drawn, spanning the expected residual range (~[-2, +2]), not merely that a legend entry exists.

Looked at but kept as-is, with reasoning

  • plt.close(fig) before return fig (LlamaPReview P1, "returned figure unusable") — verified this isn't the case here: fig.savefig() works fine on the returned figure after plt.close() (matplotlib retains the Figure and its artists; only pyplot's state-machine reference is dropped), confirmed even under warnings-as-errors. It's also the existing convention across the repo's plot modules (calibration_plots.py etc.), and the function's own save_path writes before closing — so I kept it consistent rather than diverge.
  • DRY on the save/show/close tail (P2) — left to match the established per-module plot convention (calibration_plots.py / bias_plots.py repeat the same tail); happy to refactor that across all plot modules in a separate PR if you'd prefer.

Full suite 374 → 375; ruff check, ruff format --check ., and mypy --follow-imports=skip all clean.

@Khanz9664 Khanz9664 left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Whatsonyourmind

Thanks for the continued work on the regression support roadmap.

I've reviewed the implementation, tests, and follow-up changes. The scope is well aligned with the progression established by #141 and #142, and the visualization layer integrates cleanly with the existing TrustReport and visualization architecture.

A few things I particularly appreciated:

  • Regression-only visualization guards mirror the existing classification patterns cleanly.
  • The prediction-interval overlay is implemented in residual space rather than raw prediction space, which makes the diagnostic substantially more useful.
  • Input validation and shape handling are thorough.
  • The visualization test coverage is strong, and I appreciate the follow-up enhancement to verify that the interval band is actually rendered rather than only checking legend entries.

I also reviewed the discussion around the plt.close(fig) pattern. Since this matches the existing visualization modules and the returned figure remains usable for downstream operations such as savefig(), I don't consider that a blocker for this PR.

Overall this is a solid addition that completes the visualization portion of the regression roadmap and keeps the implementation appropriately scoped.

Thanks again for the thoughtful incremental approach to the regression work. 🚀

@Khanz9664 Khanz9664 merged commit d1f9f9d into Khanz9664:main Jun 19, 2026
11 checks passed
@Khanz9664

Copy link
Copy Markdown
Owner

@Whatsonyourmind

Thanks again for the continued contributions and for helping move the regression roadmap forward.

With the merge of #141 (Regression Metrics), #142 (Regression Pipeline & TrustReport Integration), and #143 (Regression Visualizations), the majority of the original scope from #82 is now complete:

✅ Regression metrics (Error Distribution, PICP, Error-Variance Correlation)

✅ Regression pipeline auto-dispatch

✅ Regression report rendering and serialization

✅ Residual analysis visualization

✅ Error distribution visualization

The main remaining item is the Regression Trust Score.

Unlike the previous pieces, this requires a bit more design work because the existing Trust Score framework was built around classification-oriented concepts (calibration, confidence behavior, deployment risk, etc.). For regression we need to determine:

  • Which metrics should contribute to the score
  • How those metrics should be weighted
  • How uncertainty calibration (PICP) should influence trust
  • Whether the scoring system should remain comparable to classification reports or be regression-specific
  • How deployment verdicts should be generated for regression workloads

Rather than jumping directly into implementation, I'd recommend opening a design/RFC discussion first so we can align on the scoring philosophy before introducing a public API.

If you're interested in taking that on, feel free to open a proposal issue outlining a possible regression trust score framework and we can iterate on it together before implementation.

Thanks again for the thoughtful incremental approach throughout this feature series. 🚀

@Whatsonyourmind

Copy link
Copy Markdown
Contributor Author

Thanks for merging the series, and for the thoughtful breakdown of what the Regression Trust Score needs — agreed that the scoring philosophy should be settled before any public API.

I've opened #145 as a design/RFC proposal that addresses each of your five questions, grounded in the merged regression metrics (error_distribution / prediction_interval_coverage / error_variance_correlation) and the existing trust_score.py. The core proposal: keep the interface identical (0–100, A/B/C/D, TrustScoreResult, weight-redistribution, blockers) but use three regression-specific dimensions, and — the one point I'd most like your read on — weight uncertainty reliability above raw accuracy, on the logic that a low-R² model with honest intervals is safer to deploy than a high-R² model with decorative ones.

Happy to iterate there before I touch any implementation. 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants