feat(viz): regression residual + error-distribution plots (#82)#143
Conversation
Phase 3 of the regression track (after metrics Khanz9664#141 and pipeline/report integration Khanz9664#142). Adds the two visualizations from the Khanz9664#82 checklist: - trustlens/visualization/regression_plots.py - plot_residuals(y_true, y_pred, prediction_intervals=None): residuals vs predicted, with a zero-error reference, an optional prediction-interval band (shown in residual space), and a bias / heteroscedasticity annotation (corr(|residual|, prediction)). - plot_error_distribution(y_true, y_pred): signed-error histogram with a fitted-normal overlay and MAE/RMSE annotation. Mirrors the existing plot modules: apply_style() theming, save_path/show contract, returns a matplotlib Figure; matplotlib imported lazily by report. - TrustReport.plot_residuals() / .plot_error_distribution(), guarded by a new _require_regression() (classification reports raise NotImplementedError, the mirror of the existing _require_classification guard). Updated the classification guard message to point at the new regression plots. - TrustReport now retains prediction_intervals / predicted_variance (optional, None for classification) so the interval band is available end-to-end from report.plot_residuals(); wired through the regression pipeline. Tests: tests/test_regression_visualization.py (16) - figure/axes assertions, the interval band, validation errors, report-method integration, the classification guard, and a no-display (Agg) smoke test. Full suite 358 -> 374. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds ChangesRegression Visualization Feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@trustlens/visualization/regression_plots.py`:
- Around line 114-116: Before indexing prediction_intervals at indices [0] and
[1] in the block where prediction_intervals is not None, add validation to
ensure that prediction_intervals has exactly 2 elements. If the length is not 2,
raise a ValueError with a clear message indicating that prediction_intervals
must be a sequence with 2 elements (lower and upper bounds). This validation
should occur immediately after the check for prediction_intervals is not None
and before attempting to access prediction_intervals[0] and
prediction_intervals[1] with the _as_1d function calls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6a74dd5b-d010-488c-b3f7-e576583e09ee
📒 Files selected for processing (4)
tests/test_regression_visualization.pytrustlens/core/pipeline.pytrustlens/report.pytrustlens/visualization/regression_plots.py
There was a problem hiding this comment.
AI Code Review by LlamaPReview
🎯 TL;DR & Recommendation
Recommendation: Request Changes
This PR adds regression visualization functions but breaks the returned Figure lifecycle by closing it before return, rendering the object unusable after retrieval.
📄 Documentation Diagram
This diagram documents the regression visualization workflow from data analysis to plot generation.
sequenceDiagram
participant User
participant Analyze as analyze()
participant Report as TrustReport
participant Viz as regression_plots
User->>Analyze: call with regression data, intervals
Analyze->>Report: create TrustReport with intervals
Report->>User: return report
User->>Report: report.plot_residuals()
Report->>Viz: plot_residuals(..., prediction_intervals)
Note over Viz: PR #35;143: new plot with interval band
Viz-->>Report: return figure
Report-->>User: return figure (usable)
🌟 Strengths
- Solid test coverage with 16 comprehensive tests, including edge cases and integration.
| Priority | File | Category | Impact Summary (≤12 words) | Anchors |
|---|---|---|---|---|
| P1 | trustlens/.../regression_plots.py | Bug | Returned figure closed; breaks contract | |
| P2 | trustlens/.../regression_plots.py | Maintainability | Boilerplate duplicate; DRY violation | |
| P2 | tests/test_regression_visualization.py | Testing | Interval band test only checks legend | path:t... |
📈 Risk Diagram
This diagram illustrates the risk where plt.close(fig) destroys the returned figure.
sequenceDiagram
participant Report as TrustReport
participant Viz as plot_residuals
participant Caller
Report->>Viz: plot_residuals()
Viz->>Viz: plt.close(fig)
Viz-->>Caller: return fig (closed)
Note over Caller: R1(P1): Returned figure closed; axes operations fail
⚠️ **Unanchored Suggestions (Manual Review Recommended)**
The following suggestions could not be precisely anchored to a specific line in the diff. This can happen if the code is outside the changed lines, has been significantly refactored, or if the suggestion is a general observation. Please review them carefully in the context of the full file.
📁 File: trustlens/visualization/regression_plots.py
The boilerplate code for apply_style, figure/axes creation, axis labels, annotation box setup, legend, grid, save/show/close logic is duplicated verbatim between plot_residuals (approx. 90 lines) and plot_error_distribution (approx. 80 lines). This duplication violates the DRY principle, increases the surface for inconsistencies if the styling defaults ever change, and makes future similar plot functions harder to add. A shared private helper (e.g., _setup_axes, _finalize_figure) could encapsulate the common pattern.
Suggestion:
def _finalize_figure(
fig: plt.Figure,
ax: plt.Axes,
theme,
save_path: str | None,
show: bool,
) -> plt.Figure:
ax.legend(loc="best", fontsize=10)
ax.grid(True, alpha=theme.grid["alpha"])
if save_path:
fig.savefig(save_path, dpi=theme.fig_defaults["savefig_dpi"], bbox_inches="tight")
if show:
plt.show()
return figRelated Code:
with apply_style() as theme:
blue = theme.brand["blue"]
orange = theme.brand["orange"]
gray = theme.brand["muted_gray"]
neutral = theme.semantic["neutral"]
fig, ax = plt.subplots(figsize=(7, 5), constrained_layout=True)
# ... plot-specific drawing ...
ax.set_xlabel(...)
ax.set_ylabel(...)
ax.set_title(...)
ax.legend(loc="best", fontsize=10)
ax.grid(True, alpha=theme.grid["alpha"])
if save_path:
fig.savefig(save_path, dpi=theme.fig_defaults["savefig_dpi"], bbox_inches="tight")
if show:
plt.show()
plt.close(fig)
return fig💡 Have feedback? We'd love to hear it in our GitHub Discussions.
✨ This review was generated by LlamaPReview Advanced, which is free for all open-source projects. Learn more.
Addressing the automated review on Khanz9664#143: - plot_residuals: validate prediction_intervals arity before indexing, so a malformed value (e.g. a 1-element tuple) raises a clear ValueError instead of an IndexError (CodeRabbit). - test: the interval-band test now asserts the fill_between PolyCollection is actually drawn spanning the expected residual range (~[-2, +2]), not merely that a legend entry exists; plus a test for the new arity ValueError (LlamaPReview P2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed Fixed
Looked at but kept as-is, with reasoning
Full suite 374 → 375; |
Khanz9664
left a comment
There was a problem hiding this comment.
Thanks for the continued work on the regression support roadmap.
I've reviewed the implementation, tests, and follow-up changes. The scope is well aligned with the progression established by #141 and #142, and the visualization layer integrates cleanly with the existing TrustReport and visualization architecture.
A few things I particularly appreciated:
- Regression-only visualization guards mirror the existing classification patterns cleanly.
- The prediction-interval overlay is implemented in residual space rather than raw prediction space, which makes the diagnostic substantially more useful.
- Input validation and shape handling are thorough.
- The visualization test coverage is strong, and I appreciate the follow-up enhancement to verify that the interval band is actually rendered rather than only checking legend entries.
I also reviewed the discussion around the plt.close(fig) pattern. Since this matches the existing visualization modules and the returned figure remains usable for downstream operations such as savefig(), I don't consider that a blocker for this PR.
Overall this is a solid addition that completes the visualization portion of the regression roadmap and keeps the implementation appropriately scoped.
Thanks again for the thoughtful incremental approach to the regression work. 🚀
|
Thanks again for the continued contributions and for helping move the regression roadmap forward. With the merge of #141 (Regression Metrics), #142 (Regression Pipeline & TrustReport Integration), and #143 (Regression Visualizations), the majority of the original scope from #82 is now complete: ✅ Regression metrics (Error Distribution, PICP, Error-Variance Correlation) ✅ Regression pipeline auto-dispatch ✅ Regression report rendering and serialization ✅ Residual analysis visualization ✅ Error distribution visualization The main remaining item is the Regression Trust Score. Unlike the previous pieces, this requires a bit more design work because the existing Trust Score framework was built around classification-oriented concepts (calibration, confidence behavior, deployment risk, etc.). For regression we need to determine:
Rather than jumping directly into implementation, I'd recommend opening a design/RFC discussion first so we can align on the scoring philosophy before introducing a public API. If you're interested in taking that on, feel free to open a proposal issue outlining a possible regression trust score framework and we can iterate on it together before implementation. Thanks again for the thoughtful incremental approach throughout this feature series. 🚀 |
|
Thanks for merging the series, and for the thoughtful breakdown of what the Regression Trust Score needs — agreed that the scoring philosophy should be settled before any public API. I've opened #145 as a design/RFC proposal that addresses each of your five questions, grounded in the merged regression metrics ( Happy to iterate there before I touch any implementation. 🚀 |
Regression visualizations (phase 3 of #82)
Follows the merged metrics (#141) and pipeline/report integration (#142). Implements the two visualizations from the #82 checklist, as agreed in #142.
What's added
trustlens/visualization/regression_plots.pyplot_residuals(y_true, y_pred, prediction_intervals=None, …)— residuals (y_true − y_pred) vs. predicted value: the canonical view for heteroscedasticity (a fan/curve in the cloud) and bias (the cloud drifting off zero). Includes a dashed zero-error reference, an optional prediction-interval band drawn in residual space (lower − predtoupper − pred), and abias/corr(|residual|, ŷ)annotation.plot_error_distribution(y_true, y_pred, …)— signed-error histogram with a fitted-normal overlay (so skew / heavy tails are obvious) and an MAE/RMSE annotation.Both follow the existing plot-module conventions:
apply_style()theming, thesave_path/showcontract, and a returnedmatplotlib.figure.Figure.matplotlibstays lazily imported by the report (the module itself is imported inside the report methods, as the other viz modules are).TrustReportmethodsTrustReport.plot_residuals()/.plot_error_distribution(), guarded by a new_require_regression()— the mirror of the existing_require_classification(); calling them on a classification report raisesNotImplementedError. The classification guard message now points at these new plots.prediction_intervals/predicted_variance(optional,Nonefor classification) so the interval band is available end-to-end fromreport.plot_residuals(); wired through the regression pipeline. Backward-compatible (new constructor params default toNone).Tests
tests/test_regression_visualization.py(16): figure/axes assertions, the interval band (present only when supplied), validation errors (shape mismatch, empty,bins < 1, mismatched bounds),TrustReportmethod integration, the classification guard, and a no-display (Agg) smoke test. Full suite 358 → 374, all green;ruff check,ruff format --check ., andmypy trustlens/ --follow-imports=skipclean.Scoped as a dedicated PR, separate from the regression trust score (which, as discussed, needs its own design pass on weighting/thresholds). Closing notes / screenshots happy to add if useful.
Summary by CodeRabbit
Release Notes
New Features
Tests