Skip to content

Commit 315dfb3

Browse files
Update examples for v2.0.0 (#4)
* Create contrib folder and move old examples there. Update remaining examples in root to work with Cleanlab 2.0. Update README with table of contents, description for each example, and instructions. Add requirements file. * Update README * Update v1 README * Update requirements.txt * Update requirements.txt * Update README * Update README * Update notebooks and run_all_notebooks.py script * Update README * Cleanup docstring * Update README. Change format of header for classifier_comparison.ipynb. * Update README * Rename LearningWIthNoisyLabels to CleanLearning everywhere * Update requirements * Change cleanlab.filter.keep_at_least_n_per_class to _keep_at_least_n_per_class() * Add example for cifar CNN and coteaching experimental modules * Update README * Update README * Update README * Update README * Update README * Raise ValueError if epochs < num_gradual for coteaching * Add example for cleanlab.experimental.mnist_pytorch * Update README.md * Add example for fasttext * Change cleanlab.noise_generation to cleanlab.benchmarking.noise_generation * Rename cleanlab.util to cleanlab.internal.util * Cleanup README * Add relative links to table in README * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Change column name to "Example" in README table * Create separate cuda_requirements.txt file for examples that require GPU for training * Update all README * Update all README * Update README to recommend use of latest stable cleanlab release * Update README * Update all README * Update docs link to use v2.0.0 * use stabler links Co-authored-by: Jonas Mueller <[email protected]>
1 parent 4be2bc0 commit 315dfb3

File tree

73 files changed

+7290
-1656
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+7290
-1656
lines changed

.gitignore

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
env/
2+
temp/
3+
output/
4+
data/
5+
.ipynb_checkpoints/
6+
cifar10-cnn-coteaching/data/

README.md

+46-13
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,54 @@
1-
# ``cleanlab`` Examples
1+
# cleanlab Examples
22

3-
Not sure where to start? Try checking out how to find [ImageNet Label Errors](imagenet/imagenet_train_label_errors.ipynb).
3+
This repo contains code examples that demonstrate how to use [cleanlab](https://github.com/cleanlab/cleanlab) and how [confident learning](https://arxiv.org/abs/1911.00068) works to find label errors.
44

5+
To quickly learn the basics of running cleanlab on your own data, we recommend first starting [here](https://docs.cleanlab.ai/) before diving into the examples below.
56

6-
A brief description of the files and folders:
7-
* `imagenet`, 'cifar10', 'mnist' - code to find label errors in these datasets and reproduce the results in the [confident learning paper](https://arxiv.org/abs/1911.00068). You will also need to `git clone` [confidentlearning-reproduce](https://github.com/cgnorthcutt/confidentlearning-reproduce).
8-
- [imagenet_train_crossval.py](imagenet/imagenet_train_crossval.py) - a powerful script to train cross-validated predictions on ImageNet, combine cv folds, train with on masked input (train without label errors), etc.
9-
- [cifar10_train_crossval.py](cifar10/cifar10_train_crossval.py) - same as above, but for CIFAR.
10-
* `classifier_comparison.ipynb` - tutorial showing `cleanlab` performance across 10 classifiers and 4 dataset distributions.
11-
* `iris_simple_example.ipynb` - tutorial showing how to use `cleanlab` on the simple IRIS dataset.
12-
* `model_selection_demo.ipynb` - tutorial showing model selection on the cleanlab's parameter settings.
13-
* `simplifying_confident_learning_tutorial.ipynb` - tutorial implementing cleanlab as raw numpy code.
14-
* `visualizing_confident_learning.ipynb` - tutorial to demonstrate the noise matrix estimation performed by cleanlab.
7+
## Table of Contents
8+
9+
Recommended order of examples to try:
10+
11+
| | Example | Description |
12+
| --- | ---------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
13+
| 1 | [iris_simple_example.ipynb](iris_simple_example.ipynb) | Use cleanlab to find synthetic label errors in the Iris dataset. |
14+
| 2 | [classifier_comparison.ipynb](classifier_comparison.ipynb) | Demonstrate how cleanlab can be used to train 10 different classifiers on 4 dataset distributions with label errors. |
15+
| 3 | [model_selection_demo.ipynb](model_selection_demo.ipynb) | Perform hyperparameter optimization to find the best settings of cleanlab's optional parameters. |
16+
| 4 | [simplifying_confident_learning_tutorial.ipynb](simplifying_confident_learning_tutorial.ipynb) | Implement cleanlab as raw numpy code. |
17+
| 5 | [visualizing_confident_learning.ipynb](visualizing_confident_learning.ipynb) | Demonstrate how cleanlab performs noise matrix estimation. |
18+
| 6 | [cifar10-cnn-coteaching](cifar10-cnn-coteaching) | Demonstrate the use of two experimental modules from cleanlab: [cifar_cnn.py](https://github.com/cleanlab/cleanlab/blob/master/cleanlab/experimental/cifar_cnn.py) and [coteaching.py](https://github.com/cleanlab/cleanlab/blob/master/cleanlab/experimental/coteaching.py) |
19+
| 7 | [mnist-cnn](mnist-cnn) | Demonstrate the use of the following experimental module from cleanlab: [mnist_pytorch.py](https://github.com/cleanlab/cleanlab/blob/master/cleanlab/experimental/mnist_pytorch.py) |
20+
| 8 | [amazon-reviews-fasttext](amazon-reviews-fasttext) | Demonstrate the use of the following experimental module from cleanlab: [fasttext.py](https://github.com/cleanlab/cleanlab/blob/master/cleanlab/experimental/fasttext.py) |
21+
22+
## Instructions
23+
24+
To run the latest example notebooks, execute the commands below which will install the required libraries in a virtual environment.
25+
26+
It is recommended to run the examples with the latest stable cleanlab release. See `requirements.txt` file.
27+
28+
```console
29+
$ python -m pip install virtualenv
30+
$ python -m venv env
31+
$ source env/bin/activate
32+
$ python -m pip install -r requirements.txt
33+
```
34+
35+
For examples 1-5, you may run the notebooks individually or run the bash script below which will execute and save each notebook.
36+
37+
Bash script:
38+
39+
```console
40+
$ bash ./run_all_notebooks.sh
41+
```
42+
43+
For examples 6-8, please follow the instructions in the `README` of each folder.
44+
45+
## Older Examples
46+
47+
See the `contrib` folder for examples from v1 of cleanlab which may be helpful for reproducing results from the [Confident Learning paper](https://arxiv.org/abs/1911.00068).
1548

1649
## License
1750

18-
Copyright (c) 2017-2021 Cleanlab Inc.
51+
Copyright (c) 2017-2022 Cleanlab Inc.
1952

2053
All files listed above and contained in this folder (<https://github.com/cleanlab/examples>) are part of cleanlab.
2154

@@ -26,7 +59,7 @@ the Free Software Foundation, either version 3 of the License, or
2659

2760
cleanlab is distributed in the hope that it will be useful,
2861
but WITHOUT ANY WARRANTY; without even the implied warranty of
29-
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
62+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
3063
GNU Affero General Public License for more details.
3164

3265
You should have received a copy of the GNU Affero General Public License in [LICENSE](LICENSE).

amazon-reviews-fasttext/README.md

+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Training a Fasttext model on the amazon reviews dataset
2+
3+
This example demonstrates the use of the following experimental module below from cleanlab:
4+
5+
- [cleanlab.experimental.fasttext.py](https://github.com/cleanlab/cleanlab/blob/master/cleanlab/experimental/fasttext.py)
6+
7+
The code is adapted from cleanlab v1 examples (see `contrib/v1` folder).
8+
9+
## Instructions
10+
11+
Run bash script below to download all the data.
12+
13+
```console
14+
$ bash ./download_data.sh
15+
```
16+
17+
Start Jupyter Lab and run the notebook: `amazon_pyx.ipynb`

0 commit comments

Comments
 (0)