Code relase for the Cough Against COVID-19 Project by the Wadhwani Institute for Artificial Intelligence supported by USAID and the Gates Foundation
Project page | Code | Paper (ArXiv) | Data
In order to use this code, you need to follow the steps below. Please check the pre-requisites to decide whether your system is compatible.
- CPU-only machine OR GPU-enabled machine
- Docker installed on your machine
- OS: Linux/Mac OS
🏁 Note: This code has been tested on Mac OS and Ubuntu.
We use docker to manage code dependencies. Please follow the steps here to set up all dependencies. This code works on both CPU-only machine/ GPU machine. However, it is recommended to use a GPU machine since CPU machine is very slow in runtime.
We release models trained to predict COVID-19 status from cough/contextual (symptoms etc.) metadata.
For the datasets used in this work, we create our own split files and those are released publicly. Please run the following (from inside docker container) to download them to assets/data/
folder.
python setup/download_data_splits.py
Broadly, we release trained checkpoints for three kinds of models:
- Cough-based
ResNet-18
models for cough-detection - Cough-based
ResNet-18
models for COVID-detection - Context-based
TabNet
models for COVID-detection
Please run the following (from inside docker container) to download them to assets/models/
folder.
aws s3 sync --no-sign-request --region=ap-south-1 s3://covid-ml-data/ assets/
To try out our model(s) on sample data, please follow the instructions.
-
Cough-based model: Follow the notebook here to predict COVID from cough using a pretrained model released with the repository. If you want to try on your own cough samples, you can record and store them in
assets/data/
and run the notebook by changing appropriate paths. -
Context-based model: Follow the notebook here to predict COVID from contextual features like age, symptoms, travel history etc. If you want to try on your own contextual-features, you can modify the relevant cells and run the notebook.
In order to use our and other public datasets as part of this work, you will need to first download, process the datasets and then create your own configs to train models.
We use a combination of publicly-available datasets and our own collected datasets. Please follow the steps here to download, process all datasets.
⚠️ Note: Our own datasetwiai-facility
collected from across 27 facilities in India has not been released yet due to legal constraints that prevent us from sharing the data. We are trying to resolve those before we can release the dataset in any form.
To run training on datasets downloaded in previous step, please follow the steps here.
In order to train on your own dataset(s), first, you need to set up the dataset following steps similar to those for existing dataset given here. This includes downloading/setting it in the right folder structure, processing and splitting (train-validation-test).
Next, you need to create a new .yml
config file (like this) and configure the dataset section:
dataset:
name: classification_dataset
config:
- name: <name-of-your-dataset>
version: <version-of-your-dataset>
You can also play around with various other hyperparameters in the config like optimizer, scheduler, batch sampler method, random crop duration, network architecture etc.
🚧 More coming soon!
You can evaluate your own trained models or use released model checkpoints on a given dataset. Instructions for both of these are given here.
🚧 Coming soon!
🚧 Coming soon!
If you find this code useful, kindly consider citing our papers and starring our repository:
@misc{sharma2021impact,
title={Impact of data-splits on generalization: Identifying COVID-19 from cough and context},
author={Makkunda Sharma and Nikhil Shenoy and Jigar Doshi and Piyush Bagad and Aman Dalmia and Parag Bhamare and Amrita Mahale and Saurabh Rane and Neeraj Agrawal and Rahul Panicker},
year={2021},
eprint={2106.03851},
archivePrefix={arXiv},
primaryClass={cs.SD}
}
@misc{bagad2020cough,
title={Cough Against COVID: Evidence of COVID-19 Signature in Cough Sounds},
author={Piyush Bagad and Aman Dalmia and Jigar Doshi and Arsha Nagrani and Parag Bhamare and Amrita Mahale and Saurabh Rane and Neeraj Agarwal and Rahul Panicker},
year={2020},
eprint={2009.08790},
archivePrefix={arXiv},
primaryClass={cs.SD}
}
Code Contributors (in alphabetical order):
And Jigar Doshi the Research Lead of the project
Acknowledgements:
- Our codebase design has been inspired by the ML Codebase Management Guidelines compiled by Aditya Sarma and Jigar Doshi.
Reporting issues/bugs/suggestions: If you need to bring our attention to bugs or suggest modifications, kindly create an issue and we will try our best to address it. Please feel free to contact us if you have queries.