Enhance estimator validation and implement "fit-if-needed" logic #2

RektPunk · 2026-01-09T06:34:16Z

Description

This PR improves the Pearsonify class by implementing a standardized validation workflow. It ensures the input estimator is a valid Scikit-learn instance and a classifier capable of probability estimation.

Key Changes

Uses check_is_fitted and hasattr to verify both the state (fit) and the required interface (predict_proba) of the estimator.
Implements "fit-if-needed" logic to avoid redundant training if the model is already fitted.

Summary by Sourcery

Validate estimators before use in Pearsonify and add conditional fitting behavior to avoid redundant training.

Bug Fixes:

Prevent usage of unfitted estimators by checking fit state before calibration.

Enhancements:

Enforce that the wrapped estimator implements probability predictions and raise clear errors when validation fails.
Introduce fit-if-needed logic that only trains the estimator when it is not already fitted.

sourcery-ai · 2026-01-09T06:34:22Z

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adds standardized estimator validation and conditional fitting logic to the Pearsonify wrapper, ensuring the wrapped estimator is a scikit-learn classifier with probability estimates and only fitting it when necessary.

Sequence diagram for Pearsonify fit-if-needed estimator validation

sequenceDiagram
    participant Pearsonify
    participant Estimator
    participant SklearnUtils

    Pearsonify->>SklearnUtils: check_is_fitted(estimator)
    alt estimator is fitted
        SklearnUtils-->>Pearsonify: return
        Pearsonify->>Estimator: hasattr(predict_proba)
        alt has predict_proba
            Pearsonify-->>Estimator: proceed without fitting
        else missing predict_proba
            Pearsonify-->>Pearsonify: raise TypeError(Estimator validation failed)
        end
    else estimator not fitted
        SklearnUtils-->>Pearsonify: raise NotFittedError
        Pearsonify->>Estimator: fit(X_train, y_train)
    end
    Pearsonify->>Estimator: predict_proba(X_cal)
    Estimator-->>Pearsonify: y_cal_pred_proba

Class diagram for Pearsonify wrapper with estimator validation

classDiagram
    class BaseEstimator

    class Estimator {
        fit(X_train, y_train)
        predict_proba(X)
    }

    BaseEstimator <|-- Estimator

    class Pearsonify {
        - estimator: BaseEstimator
        - alpha: float
        + __init__(estimator, alpha)
        + fit(X_train, y_train, X_cal, y_cal)
    }

    Pearsonify --> BaseEstimator: wraps

    class SklearnUtils {
        + check_is_fitted(estimator)
        + NotFittedError
    }

    Pearsonify ..> SklearnUtils: uses for validation
    Pearsonify ..> Estimator: calls fit, predict_proba

File-Level Changes

Change	Details	Files
Add validation of the wrapped estimator and conditional fitting in Pearsonify.fit.	Before training, call scikit-learn's check_is_fitted on the provided estimator to detect existing fitted state. Verify the estimator exposes a predict_proba method and raise a TypeError with a clear message when it does not. Wrap validation errors in a higher-level TypeError labeled as estimator validation failure. On NotFittedError from check_is_fitted, fall back to fitting the estimator on the provided training data instead of always refitting.	`pearsonify/wrapper.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The predict_proba interface check is only performed inside the check_is_fitted try block, so if the estimator is not fitted you skip this check entirely; consider validating the presence of predict_proba regardless of fitted status to avoid runtime errors later.
Catching TypeError around the whole validation block and then re‑raising a new TypeError can obscure the original source of the error; you might narrow the try scope or preserve the original exception type/message directly instead of wrapping it generically.
If you truly want to enforce that the estimator is a scikit‑learn estimator, consider explicitly checking isinstance(estimator, BaseEstimator) or a similar check early in __init__ rather than only relying on check_is_fitted in fit.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `predict_proba` interface check is only performed inside the `check_is_fitted` try block, so if the estimator is not fitted you skip this check entirely; consider validating the presence of `predict_proba` regardless of fitted status to avoid runtime errors later.
- Catching `TypeError` around the whole validation block and then re‑raising a new `TypeError` can obscure the original source of the error; you might narrow the `try` scope or preserve the original exception type/message directly instead of wrapping it generically.
- If you truly want to enforce that the estimator is a scikit‑learn estimator, consider explicitly checking `isinstance(estimator, BaseEstimator)` or a similar check early in `__init__` rather than only relying on `check_is_fitted` in `fit`.

## Individual Comments

### Comment 1
<location> `pearsonify/wrapper.py:30-32` </location>
<code_context>
-        self.estimator.fit(X_train, y_train)
+        try:
+            check_is_fitted(self.estimator)
+            if not hasattr(self.estimator, "predict_proba"):
+                raise TypeError("The estimator must have 'predict_proba' method.")
+        except TypeError as e:
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Tighten the `predict_proba` check to ensure it is callable, not just present.

`hasattr` only checks for the presence of the attribute, which might be `None` or non-callable. To avoid runtime errors when invoking it, use `callable(getattr(self.estimator, "predict_proba", None))` instead.

```suggestion
            check_is_fitted(self.estimator)
            if not callable(getattr(self.estimator, "predict_proba", None)):
                raise TypeError("The estimator must have a callable 'predict_proba' method.")
```
</issue_to_address>

### Comment 2
<location> `pearsonify/wrapper.py:33-34` </location>
<code_context>
+            check_is_fitted(self.estimator)
+            if not hasattr(self.estimator, "predict_proba"):
+                raise TypeError("The estimator must have 'predict_proba' method.")
+        except TypeError as e:
+            raise TypeError(f"Estimator validation failed: {e}") from e
+        except NotFittedError:
+            # Attempt to fit the estimator if not already fitted
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Narrow or simplify the `TypeError` wrapping to avoid redundant exception handling.

This `except TypeError` will also catch the `TypeError` you raise for a missing `predict_proba`, only to re-wrap it with a slightly changed message, and it may also hide unrelated `TypeError`s from inside `check_is_fitted` or the estimator. Consider either validating via explicit checks and not catching `TypeError` at all, or using/narrowing to a custom exception type you control for your own validation failure.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

pearsonify/wrapper.py

sourcery-ai · 2026-01-09T06:35:13Z

pearsonify/wrapper.py

+        except TypeError as e:
+            raise TypeError(f"Estimator validation failed: {e}") from e


suggestion (bug_risk): Narrow or simplify the TypeError wrapping to avoid redundant exception handling.

This except TypeError will also catch the TypeError you raise for a missing predict_proba, only to re-wrap it with a slightly changed message, and it may also hide unrelated TypeErrors from inside check_is_fitted or the estimator. Consider either validating via explicit checks and not catching TypeError at all, or using/narrowing to a custom exception type you control for your own validation failure.

add check fitted logic

511b03c

sourcery-ai bot reviewed Jan 9, 2026

View reviewed changes

follow the sourcery comment

ea7c0bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance estimator validation and implement "fit-if-needed" logic #2

Enhance estimator validation and implement "fit-if-needed" logic #2

Uh oh!

RektPunk commented Jan 9, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Jan 9, 2026 •

edited

Loading

Reviewer's Guide

Sequence diagram for Pearsonify fit-if-needed estimator validation

Class diagram for Pearsonify wrapper with estimator validation

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

sourcery-ai bot Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		except TypeError as e:
		raise TypeError(f"Estimator validation failed: {e}") from e

Enhance estimator validation and implement "fit-if-needed" logic #2

Are you sure you want to change the base?

Enhance estimator validation and implement "fit-if-needed" logic #2

Uh oh!

Conversation

RektPunk commented Jan 9, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for Pearsonify fit-if-needed estimator validation

Class diagram for Pearsonify wrapper with estimator validation

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sourcery-ai bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RektPunk commented Jan 9, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jan 9, 2026 •

edited

Loading