Skip to content

Supplementary Material and Code for the paper “Fair-OBNC: Correcting Label Noise for Fairer Datasets”.

License

Notifications You must be signed in to change notification settings

feedzai/fair-obnc

Repository files navigation

Fair-OBNC

Description

This repository contains the code and instructions to reproduce the experiments and results presented in the paper Fair-OBNC: Correcting Label Noise for Fairer Datasets.

Table of Contents

Replicating the conducted experiments

This section details how to replicate our experiments to obtain the results we present in the paper Fair-OBNC: Correcting Label Noise for Fairer Datasets.

The first step is to install the Aequitas Flow package:

pip install git+https://github.com/dssg/aequitas.git

Then, one can download the necessary data by running:

# To store the necessary data
>>> from generate_data import generate_data
>>> generate_data({"BankAccountFraud": ["TypeII"]})

Finally, we include in this repository the configuration files we used in our experiments, so the only step left is to run the fairobnc_experiment.py script to run the experiments:

# To run the experiments with the multiple injected noise scenarios
>>> python -m fairobnc_experiment baf typeii noise_injection_experiment --noise_injection

# To run the experiments without noise injection
>>> python -m fairobnc_experiment baf typeii noise_injection_experiment

Running your own experiments

If you wish to test our method in addtional scenarios, our framework can be used to test more cases.

Generating and loading data

The generate_data function loads the desired datasets from Aequitas, generates the IID versions of it and injects noise into the labels, storing the necessary files for using the IIDDataset and NoisyDataset classes.

# To store the necessary data
>>> from generate_data import generate_data
>>> generate_data({"BankAccountFraud": ["TypeII"]})

# To load an IID dataset 
>>> from datasets import IIDDataset
>>> iid_dataset = IIDDataset("BankAccountFraud", "TypeII")
>>> iid_dataset.load_data()
>>> iid_dataset.create_splits()

# To load a noisy dataset, where noise is being applied only on the instances from the negative class, flipping 5% of the instances belonging to the negative sensitive group and 20% of the ones from the positive group
>>> from datasets import NoisyDataset
>>> noisy_dataset = NoisyDataset("BankAccountFraud", "TypeII", {0:0.05, 1:0.20}, [0])
>>> noisy_dataset.load_data()
>>> noisy_dataset.create_splits()

Generating config files

The configsfolder is organized into 2 subfolders, following the Aequitas experiment logic:

  • methods contains the config files for each of the preprocessing methods being analyzed
  • datasets which contains the config files for each noisy version of the used datasets. These configs can be automatically generated by calling the generate_dataset_configs function:
    >>> from generate_configs import generate_dataset_configs
    >>> generate_dataset_configs({"BankAccountFraud":["TypeII"]})

Each specific type of injected noise must be run as a seperate experiment so that the same hyperparameters are sampled in each trial.

The experiment config files can be generated using the generate_experiment_file function:

>>> from generate_configs import generate_experiment_files
>>> generate_experiment_files(
...     methods = ["lightgbm", "OBNC", "Fair-OBNC", "PrevalenceSampling"],
...     variants = {"BankAccountFraud":["TypeII"]},
...     noise_injection = True,
...     n_trials = 50,
)

Running experiments

After setting up all the data and config files, one can run the fairobnc_experiment.py script to run the experiments:

>>> python -m fairobnc_experiment baf typeii noise_injection_experiment --noise_injection

Analyzing results

The result_analysis.py file contains the definition of the functions used to analyze the obtained results and generate the plot presented in the paper.

About

Supplementary Material and Code for the paper “Fair-OBNC: Correcting Label Noise for Fairer Datasets”.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages