-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Wrapper to combine any Over Sampler and Under Sampler #787
Comments
I am thinking that we could make somehow make such an estimator using a So in some way, a @chkoar any thoughts? |
Correct
Well, we prevented that to avoid fishy behaviors when creating nested pipelines and feature unions that one have sampler and the other does not. But, we could check this recursively and allow that kind of pipeline no? In any case, I think that a combined sampler could be implemented using the from collections import Counter
from imblearn import FunctionSampler
from imblearn.datasets import fetch_datasets
from imblearn.over_sampling import RandomOverSampler as ROS
from imblearn.pipeline import make_pipeline
from imblearn.under_sampling import RandomUnderSampler as RUS
random_state = 0
samplers = [
ROS(random_state=random_state, sampling_strategy=0.5),
RUS(random_state=random_state),
]
def load_dataset():
dataset_name = "abalone_19"
dataset = fetch_datasets(filter_data=[dataset_name])[dataset_name]
return dataset.data, dataset.target
def combined_sampler(X, y):
for sampler in samplers:
X, y = sampler.fit_resample(X, y)
return X, y
pipelined = make_pipeline(*samplers)
functionized = FunctionSampler(func=combined_sampler)
X, y = load_dataset()
print(Counter(y))
Xs, ys = pipelined.fit_resample(X, y)
print(Counter(ys))
Xs, ys = functionized.fit_resample(X, y)
print(Counter(ys)) |
Hello, has there been any progress on this?
The transformers used may not be important, but the structure (nested |
@GiuseppeMagazzu thanks for bringing this up. It is a known issue and we should push in this direction too. |
Hello, Here is the pipelines that I am trying to use for optimizing parameters:
and here is the custom transformer that I use to eliminate some samples
The issue is that when I try to cross_validate over the whole pipeline using this code:
it throws this error:
|
Is your feature request related to a problem? Please describe
Most of the time the data that needs to be resampled consists of Nominal and Continuous data.
So SMOTENC is the proper solution for oversampling the data, however, it is not possible to use it on combination models.
Combination models (SMOTEEN & SMOTENC), currently only support regular SMOTE.
Describe the solution you'd like
Instead of combination models, it would be better if we have some kind of wrapper that can combine any oversampling models with any undersampling models.
Examples:
The text was updated successfully, but these errors were encountered: