-
Notifications
You must be signed in to change notification settings - Fork 3
Enhance estimator validation and implement "fit-if-needed" logic #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideAdds standardized estimator validation and conditional fitting logic to the Pearsonify wrapper, ensuring the wrapped estimator is a scikit-learn classifier with probability estimates and only fitting it when necessary. Sequence diagram for Pearsonify fit-if-needed estimator validationsequenceDiagram
participant Pearsonify
participant Estimator
participant SklearnUtils
Pearsonify->>SklearnUtils: check_is_fitted(estimator)
alt estimator is fitted
SklearnUtils-->>Pearsonify: return
Pearsonify->>Estimator: hasattr(predict_proba)
alt has predict_proba
Pearsonify-->>Estimator: proceed without fitting
else missing predict_proba
Pearsonify-->>Pearsonify: raise TypeError(Estimator validation failed)
end
else estimator not fitted
SklearnUtils-->>Pearsonify: raise NotFittedError
Pearsonify->>Estimator: fit(X_train, y_train)
end
Pearsonify->>Estimator: predict_proba(X_cal)
Estimator-->>Pearsonify: y_cal_pred_proba
Class diagram for Pearsonify wrapper with estimator validationclassDiagram
class BaseEstimator
class Estimator {
fit(X_train, y_train)
predict_proba(X)
}
BaseEstimator <|-- Estimator
class Pearsonify {
- estimator: BaseEstimator
- alpha: float
+ __init__(estimator, alpha)
+ fit(X_train, y_train, X_cal, y_cal)
}
Pearsonify --> BaseEstimator: wraps
class SklearnUtils {
+ check_is_fitted(estimator)
+ NotFittedError
}
Pearsonify ..> SklearnUtils: uses for validation
Pearsonify ..> Estimator: calls fit, predict_proba
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 2 issues, and left some high level feedback:
- The
predict_probainterface check is only performed inside thecheck_is_fittedtry block, so if the estimator is not fitted you skip this check entirely; consider validating the presence ofpredict_probaregardless of fitted status to avoid runtime errors later. - Catching
TypeErroraround the whole validation block and then re‑raising a newTypeErrorcan obscure the original source of the error; you might narrow thetryscope or preserve the original exception type/message directly instead of wrapping it generically. - If you truly want to enforce that the estimator is a scikit‑learn estimator, consider explicitly checking
isinstance(estimator, BaseEstimator)or a similar check early in__init__rather than only relying oncheck_is_fittedinfit.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `predict_proba` interface check is only performed inside the `check_is_fitted` try block, so if the estimator is not fitted you skip this check entirely; consider validating the presence of `predict_proba` regardless of fitted status to avoid runtime errors later.
- Catching `TypeError` around the whole validation block and then re‑raising a new `TypeError` can obscure the original source of the error; you might narrow the `try` scope or preserve the original exception type/message directly instead of wrapping it generically.
- If you truly want to enforce that the estimator is a scikit‑learn estimator, consider explicitly checking `isinstance(estimator, BaseEstimator)` or a similar check early in `__init__` rather than only relying on `check_is_fitted` in `fit`.
## Individual Comments
### Comment 1
<location> `pearsonify/wrapper.py:30-32` </location>
<code_context>
- self.estimator.fit(X_train, y_train)
+ try:
+ check_is_fitted(self.estimator)
+ if not hasattr(self.estimator, "predict_proba"):
+ raise TypeError("The estimator must have 'predict_proba' method.")
+ except TypeError as e:
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Tighten the `predict_proba` check to ensure it is callable, not just present.
`hasattr` only checks for the presence of the attribute, which might be `None` or non-callable. To avoid runtime errors when invoking it, use `callable(getattr(self.estimator, "predict_proba", None))` instead.
```suggestion
check_is_fitted(self.estimator)
if not callable(getattr(self.estimator, "predict_proba", None)):
raise TypeError("The estimator must have a callable 'predict_proba' method.")
```
</issue_to_address>
### Comment 2
<location> `pearsonify/wrapper.py:33-34` </location>
<code_context>
+ check_is_fitted(self.estimator)
+ if not hasattr(self.estimator, "predict_proba"):
+ raise TypeError("The estimator must have 'predict_proba' method.")
+ except TypeError as e:
+ raise TypeError(f"Estimator validation failed: {e}") from e
+ except NotFittedError:
+ # Attempt to fit the estimator if not already fitted
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Narrow or simplify the `TypeError` wrapping to avoid redundant exception handling.
This `except TypeError` will also catch the `TypeError` you raise for a missing `predict_proba`, only to re-wrap it with a slightly changed message, and it may also hide unrelated `TypeError`s from inside `check_is_fitted` or the estimator. Consider either validating via explicit checks and not catching `TypeError` at all, or using/narrowing to a custom exception type you control for your own validation failure.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| except TypeError as e: | ||
| raise TypeError(f"Estimator validation failed: {e}") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Narrow or simplify the TypeError wrapping to avoid redundant exception handling.
This except TypeError will also catch the TypeError you raise for a missing predict_proba, only to re-wrap it with a slightly changed message, and it may also hide unrelated TypeErrors from inside check_is_fitted or the estimator. Consider either validating via explicit checks and not catching TypeError at all, or using/narrowing to a custom exception type you control for your own validation failure.
Description
This PR improves the Pearsonify class by implementing a standardized validation workflow. It ensures the input estimator is a valid Scikit-learn instance and a classifier capable of probability estimation.
Key Changes
check_is_fittedandhasattrto verify both the state (fit) and the required interface (predict_proba) of the estimator.Summary by Sourcery
Validate estimators before use in Pearsonify and add conditional fitting behavior to avoid redundant training.
Bug Fixes:
Enhancements: