Thursday, April 28 from 1:30 PM - 4:30 PM MDT
Fairness in AI systems is an interdisciplinary field of research and practice that aims to understand and address some of the negative impacts of AI systems on society, with an emphasis on improving the impacts of such systems on historically underserved and marginalized communities.
In this tutorial, we will walk through the process of assessing and mitigating fairness-related harms in the context of the U.S. health care system. Specifically, we will consider a scenario involving patient health risk modeling that has demonstrated racial disparities (Obermeyer et al., 2019). This tutorial will consist of a mix of instructional content and hands-on demonstrations using Jupyter notebooks. Participants will use the Fairlearn library to assess an ML model for performance disparities across different racial groups and mitigate those disparities using a variety of algorithmic techniques. Participants will also learn how to explore, document, and communicate fairness issues, drawing on resources such as datasheets for datasets and model cards.
Participants are expected to have intermediate Python skills and familiarity with Scikit-Learn. For maximal benefit, participants should have some experience training and evaluating supervised models in Python.
For this tutorial, we encourage participants to run the tutorial notebook through Google Colab to avoid issues with local environment set-up. Click on the button above to launch a free compute envrionment for executing the Jupyter notebook and writing Python code.
If you want to follow along in this tutorial on your local machine, we recommend using the Anaconda Python distribution.
Participants will need to download the Jupyter notebook pycon-2022-students.ipynb.
If you are using Anaconda, install the necessary libraries by running the following command:
conda env create -f environment.yml
In a Python virtual environment (Python version >= 3.7), install the necessary libraries by running the following command:
pip install -r requirements.txt
If you are using pip20.3
, you may need to append the --use-deprecated=legacy-resolver
flag to avoid long wait times due to dependency resolution:
pip install -r requirements.txt --use-deprecated=legacy-resolver
You can run the checkenv.py
script to assert if the packages were installed correctly.
For this tutorial, we use a pre-processed version of the Diabetes 130-US hospitals
The original data file can be found in data/diabetic_data.csv
.
The processed dataset we use is located in data/diabetic_preprocessed.csv
. If you want to further explore how we cleaned and processed the original dataset, you can refer to preprocess.py
.
Fairlearn is an open-source, community-driven project to help data scientists improve the fairness of AI systems. It includes a Python library for assessing and mitigating fairness-related harms, and various education resources.
Fairlearn is built on topo of popular Python data science libraries, such as pandas and scikit-learn.