Active semi-supervised clustering algorithms for scikit-learn.
- Seeded-KMeans
- Constrainted-KMeans
- COP-KMeans
- Pairwise constrained K-Means (PCK-Means)
- Metric K-Means (MK-Means)
- Metric pairwise constrained K-Means (MPCK-Means)
- Explore & Consolidate
- Min-max
- Normalized point-based uncertainty (NPU) method
pip install active-semi-supervised-clustering
from sklearn import datasets, metrics
from active_semi_clustering.semi_supervised.pairwise_constraints import PCKMeans
from active_semi_clustering.active.pairwise_constraints import ExampleOracle, ExploreConsolidate, MinMax
X, y = datasets.load_iris(return_X_y=True)
First, obtain some pairwise constraints from an oracle.
# TODO implement your own oracle that will, for example, query a domain expert via GUI or CLI
oracle = ExampleOracle(y, max_queries_cnt=10)
active_learner = MinMax(n_clusters=3)
active_learner.fit(X, oracle=oracle)
pairwise_constraints = active_learner.pairwise_constraints_
Then, use the constraints to do the clustering.
clusterer = PCKMeans(n_clusters=3)
clusterer.fit(X, ml=pairwise_constraints[0], cl=pairwise_constraints[1])
Evaluate the clustering using Adjusted Rand Score.
metrics.adjusted_rand_score(y, clusterer.labels_)