Skip to content

Conversation

@RektPunk
Copy link

@RektPunk RektPunk commented Jan 9, 2026

Description

This PR improves the Pearsonify class by implementing a standardized validation workflow. It ensures the input estimator is a valid Scikit-learn instance and a classifier capable of probability estimation.

Key Changes

  • Uses check_is_fitted and hasattr to verify both the state (fit) and the required interface (predict_proba) of the estimator.
  • Implements "fit-if-needed" logic to avoid redundant training if the model is already fitted.

Summary by Sourcery

Validate estimators before use in Pearsonify and add conditional fitting behavior to avoid redundant training.

Bug Fixes:

  • Prevent usage of unfitted estimators by checking fit state before calibration.

Enhancements:

  • Enforce that the wrapped estimator implements probability predictions and raise clear errors when validation fails.
  • Introduce fit-if-needed logic that only trains the estimator when it is not already fitted.

@sourcery-ai
Copy link

sourcery-ai bot commented Jan 9, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adds standardized estimator validation and conditional fitting logic to the Pearsonify wrapper, ensuring the wrapped estimator is a scikit-learn classifier with probability estimates and only fitting it when necessary.

Sequence diagram for Pearsonify fit-if-needed estimator validation

sequenceDiagram
    participant Pearsonify
    participant Estimator
    participant SklearnUtils

    Pearsonify->>SklearnUtils: check_is_fitted(estimator)
    alt estimator is fitted
        SklearnUtils-->>Pearsonify: return
        Pearsonify->>Estimator: hasattr(predict_proba)
        alt has predict_proba
            Pearsonify-->>Estimator: proceed without fitting
        else missing predict_proba
            Pearsonify-->>Pearsonify: raise TypeError(Estimator validation failed)
        end
    else estimator not fitted
        SklearnUtils-->>Pearsonify: raise NotFittedError
        Pearsonify->>Estimator: fit(X_train, y_train)
    end
    Pearsonify->>Estimator: predict_proba(X_cal)
    Estimator-->>Pearsonify: y_cal_pred_proba
Loading

Class diagram for Pearsonify wrapper with estimator validation

classDiagram
    class BaseEstimator

    class Estimator {
        fit(X_train, y_train)
        predict_proba(X)
    }

    BaseEstimator <|-- Estimator

    class Pearsonify {
        - estimator: BaseEstimator
        - alpha: float
        + __init__(estimator, alpha)
        + fit(X_train, y_train, X_cal, y_cal)
    }

    Pearsonify --> BaseEstimator: wraps

    class SklearnUtils {
        + check_is_fitted(estimator)
        + NotFittedError
    }

    Pearsonify ..> SklearnUtils: uses for validation
    Pearsonify ..> Estimator: calls fit, predict_proba
Loading

File-Level Changes

Change Details Files
Add validation of the wrapped estimator and conditional fitting in Pearsonify.fit.
  • Before training, call scikit-learn's check_is_fitted on the provided estimator to detect existing fitted state.
  • Verify the estimator exposes a predict_proba method and raise a TypeError with a clear message when it does not.
  • Wrap validation errors in a higher-level TypeError labeled as estimator validation failure.
  • On NotFittedError from check_is_fitted, fall back to fitting the estimator on the provided training data instead of always refitting.
pearsonify/wrapper.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The predict_proba interface check is only performed inside the check_is_fitted try block, so if the estimator is not fitted you skip this check entirely; consider validating the presence of predict_proba regardless of fitted status to avoid runtime errors later.
  • Catching TypeError around the whole validation block and then re‑raising a new TypeError can obscure the original source of the error; you might narrow the try scope or preserve the original exception type/message directly instead of wrapping it generically.
  • If you truly want to enforce that the estimator is a scikit‑learn estimator, consider explicitly checking isinstance(estimator, BaseEstimator) or a similar check early in __init__ rather than only relying on check_is_fitted in fit.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `predict_proba` interface check is only performed inside the `check_is_fitted` try block, so if the estimator is not fitted you skip this check entirely; consider validating the presence of `predict_proba` regardless of fitted status to avoid runtime errors later.
- Catching `TypeError` around the whole validation block and then re‑raising a new `TypeError` can obscure the original source of the error; you might narrow the `try` scope or preserve the original exception type/message directly instead of wrapping it generically.
- If you truly want to enforce that the estimator is a scikit‑learn estimator, consider explicitly checking `isinstance(estimator, BaseEstimator)` or a similar check early in `__init__` rather than only relying on `check_is_fitted` in `fit`.

## Individual Comments

### Comment 1
<location> `pearsonify/wrapper.py:30-32` </location>
<code_context>
-        self.estimator.fit(X_train, y_train)
+        try:
+            check_is_fitted(self.estimator)
+            if not hasattr(self.estimator, "predict_proba"):
+                raise TypeError("The estimator must have 'predict_proba' method.")
+        except TypeError as e:
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Tighten the `predict_proba` check to ensure it is callable, not just present.

`hasattr` only checks for the presence of the attribute, which might be `None` or non-callable. To avoid runtime errors when invoking it, use `callable(getattr(self.estimator, "predict_proba", None))` instead.

```suggestion
            check_is_fitted(self.estimator)
            if not callable(getattr(self.estimator, "predict_proba", None)):
                raise TypeError("The estimator must have a callable 'predict_proba' method.")
```
</issue_to_address>

### Comment 2
<location> `pearsonify/wrapper.py:33-34` </location>
<code_context>
+            check_is_fitted(self.estimator)
+            if not hasattr(self.estimator, "predict_proba"):
+                raise TypeError("The estimator must have 'predict_proba' method.")
+        except TypeError as e:
+            raise TypeError(f"Estimator validation failed: {e}") from e
+        except NotFittedError:
+            # Attempt to fit the estimator if not already fitted
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Narrow or simplify the `TypeError` wrapping to avoid redundant exception handling.

This `except TypeError` will also catch the `TypeError` you raise for a missing `predict_proba`, only to re-wrap it with a slightly changed message, and it may also hide unrelated `TypeError`s from inside `check_is_fitted` or the estimator. Consider either validating via explicit checks and not catching `TypeError` at all, or using/narrowing to a custom exception type you control for your own validation failure.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +33 to +34
except TypeError as e:
raise TypeError(f"Estimator validation failed: {e}") from e
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Narrow or simplify the TypeError wrapping to avoid redundant exception handling.

This except TypeError will also catch the TypeError you raise for a missing predict_proba, only to re-wrap it with a slightly changed message, and it may also hide unrelated TypeErrors from inside check_is_fitted or the estimator. Consider either validating via explicit checks and not catching TypeError at all, or using/narrowing to a custom exception type you control for your own validation failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant