Well, shouldn't multi-label be:

[[0,1,1],
 [1,0,0],
 [0,1,0],
 [1,0,1],
 [1,0,1],
 ...]

Because the version mentioned by @glemaitre appears - as stated by @chkoar - to be a binarized version of a multi-class problem. But the difference between multi-class and multi-label is that multi-class only allows the assignment of a single class to the target instance, whereas in a multi-label case it can be an arbitrary amount of class assignments.

For an implementation one might consider the label powerset transformation of multi-label data into a multiclass data set. So e.g. for the data set above one might apply the following transformation:

[[1],
 [2],
 [3],
 [4],
 [4],
 ...]

For all people searching for a quick and dirty solution I appear to have some success with the following solution:

from skmultilearn.problem_transformation import LabelPowerset
from imblearn.over_sampling import RandomOverSampler

# Import a dataset with X and multi-label y

lp = LabelPowerset()
ros = RandomOverSampler(random_state=42)

# Applies the above stated multi-label (ML) to multi-class (MC) transformation.
yt = lp.transform(y)

X_resampled, y_resampled = ros.fit_sample(X, yt)

# Inverts the ML-MC transformation to recreate the ML set
y_resampled = lp.inverse_transform(y_resampled)

(Use of the skmultilearn package for convenience sake to avoid custom transformation!)

Add support to multilabel #340

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions