Skip to content

Commit 8ef3483

Browse files
committed
initial commit
1 parent 1536f4d commit 8ef3483

30 files changed

+3744
-0
lines changed

.gitignore

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
data/
24+
pip-wheel-metadata/
25+
share/python-wheels/
26+
*.egg-info/
27+
.installed.cfg
28+
*.egg
29+
MANIFEST
30+
31+
# PyInstaller
32+
# Usually these files are written by a python script from a template
33+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
34+
*.manifest
35+
*.spec
36+
37+
# Installer logs
38+
pip-log.txt
39+
pip-delete-this-directory.txt
40+
41+
# Unit test / coverage reports
42+
.idea/
43+
htmlcov/
44+
.tox/
45+
.nox/
46+
.coverage
47+
.coverage.*
48+
.cache
49+
nosetests.xml
50+
coverage.xml
51+
*.cover
52+
*.py,cover
53+
.hypothesis/
54+
.pytest_cache/
55+
56+
# Translations
57+
*.mo
58+
*.pot
59+
60+
# Django stuff:
61+
*.log
62+
local_settings.py
63+
db.sqlite3
64+
db.sqlite3-journal
65+
66+
# Flask stuff:
67+
instance/
68+
.webassets-cache
69+
70+
# Scrapy stuff:
71+
.scrapy
72+
73+
# Sphinx documentation
74+
docs/_build/
75+
76+
# PyBuilder
77+
target/
78+
79+
# Jupyter Notebook
80+
.ipynb_checkpoints
81+
82+
# IPython
83+
profile_default/
84+
ipython_config.py
85+
86+
# pyenv
87+
.python-version
88+
89+
# pipenv
90+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
91+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
92+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
93+
# install all needed dependencies.
94+
#Pipfile.lock
95+
96+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
97+
__pypackages__/
98+
99+
# Celery stuff
100+
celerybeat-schedule
101+
celerybeat.pid
102+
103+
# SageMath parsed files
104+
*.sage.py
105+
106+
# Environments
107+
.env
108+
.venv
109+
env/
110+
venv/
111+
ENV/
112+
env.bak/
113+
venv.bak/
114+
115+
# Spyder project settings
116+
.spyderproject
117+
.spyproject
118+
119+
# Rope project settings
120+
.ropeproject
121+
122+
# mkdocs documentation
123+
/site
124+
125+
# mypy
126+
.mypy_cache/
127+
.dmypy.json
128+
dmypy.json
129+
130+
# Pyre type checker
131+
.pyre/
132+
133+
#plots
134+
/plots
135+
/.idea

README.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Improving Screening Processes via Calibrated Subset Selection
2+
3+
This repo contains the code for the empirical evaluation in the paper
4+
[Improving Screening Processes via Calibrated Subset Selection](https://arxiv.org/abs/2202.01147),
5+
which includes an implementation of the Calibrated Subset Selection algorithm proposed in the paper.
6+
7+
8+
### Create Environment
9+
Make sure [conda](https://docs.conda.io/en/latest/) is installed. Run
10+
```angular2html
11+
conda env create -f environment.yml
12+
source activate alg_screen
13+
```
14+
15+
### Download and Prepare Data
16+
17+
Set prepare_data = True and submit = False in params_exp_noise.py and params_exp_diversity_noise.py
18+
19+
Run
20+
```angular2html
21+
python ./scripts/run_exp_noise.py
22+
python ./scripts/run_exp_diversity_noise.py
23+
```
24+
25+
### Run Experiments
26+
Set prepare_data = False and submit = True in params_exp_noise.py and params_exp_diversity_noise.py
27+
28+
On a cluster with [Slurm](https://slurm.schedmd.com/documentation.html) workload manager, run
29+
```angular2html
30+
python ./scripts/run_exp_noise.py
31+
python ./scripts/run_exp_cal_size.py
32+
python ./scripts/run_exp_diversity_noise.py
33+
```
34+
35+
### Plot Figures
36+
Run
37+
```angular2html
38+
python ./scripts/plot_exp_normal.py
39+
python ./scripts/plot_exp_diversity.py
40+
```
41+
42+
### Bibtex
43+
```angular2html
44+
@article{wang2022improving,
45+
title={Improving Screening Processes via Calibrated Subset Selection},
46+
author={Wang, Lequn and Joachims, Thorsten and Rodriguez, Manuel Gomez},
47+
journal={arXiv preprint arXiv:2202.01147},
48+
year={2022}
49+
}
50+
```

environment.yml

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
name: alg_screen
2+
channels:
3+
- defaults
4+
dependencies:
5+
- _libgcc_mutex=0.1=main
6+
- _openmp_mutex=4.5=1_gnu
7+
- blas=1.0=mkl
8+
- bottleneck=1.3.2=py39hdd57654_1
9+
- brotli=1.0.9=he6710b0_2
10+
- ca-certificates=2021.10.26=h06a4308_2
11+
- certifi=2021.10.8=py39h06a4308_0
12+
- cycler=0.11.0=pyhd3eb1b0_0
13+
- dbus=1.13.18=hb2f20db_0
14+
- expat=2.4.1=h2531618_2
15+
- fontconfig=2.13.1=h6c09931_0
16+
- fonttools=4.25.0=pyhd3eb1b0_0
17+
- freetype=2.11.0=h70c0345_0
18+
- giflib=5.2.1=h7b6447c_0
19+
- glib=2.69.1=h5202010_0
20+
- gst-plugins-base=1.14.0=h8213a91_2
21+
- gstreamer=1.14.0=h28cd5cc_2
22+
- icu=58.2=he6710b0_3
23+
- intel-openmp=2021.4.0=h06a4308_3561
24+
- joblib=1.1.0=pyhd3eb1b0_0
25+
- jpeg=9d=h7f8727e_0
26+
- kiwisolver=1.3.1=py39h2531618_0
27+
- lcms2=2.12=h3be6417_0
28+
- ld_impl_linux-64=2.35.1=h7274673_9
29+
- libffi=3.3=he6710b0_2
30+
- libgcc-ng=9.3.0=h5101ec6_17
31+
- libgfortran-ng=7.5.0=ha8ba4b0_17
32+
- libgfortran4=7.5.0=ha8ba4b0_17
33+
- libgomp=9.3.0=h5101ec6_17
34+
- libpng=1.6.37=hbc83047_0
35+
- libstdcxx-ng=9.3.0=hd4cf53a_17
36+
- libtiff=4.2.0=h85742a9_0
37+
- libuuid=1.0.3=h7f8727e_2
38+
- libwebp=1.2.0=h89dd481_0
39+
- libwebp-base=1.2.0=h27cfd23_0
40+
- libxcb=1.14=h7b6447c_0
41+
- libxml2=2.9.12=h03d6c58_0
42+
- lz4-c=1.9.3=h295c915_1
43+
- matplotlib=3.5.0=py39h06a4308_0
44+
- matplotlib-base=3.5.0=py39h3ed280b_0
45+
- mkl=2021.4.0=h06a4308_640
46+
- mkl-service=2.4.0=py39h7f8727e_0
47+
- mkl_fft=1.3.1=py39hd3c417c_0
48+
- mkl_random=1.2.2=py39h51133e4_0
49+
- munkres=1.1.4=py_0
50+
- ncurses=6.3=h7f8727e_2
51+
- numexpr=2.8.1=py39h6abb31d_0
52+
- numpy=1.21.2=py39h20f2e39_0
53+
- numpy-base=1.21.2=py39h79a1101_0
54+
- olefile=0.46=pyhd3eb1b0_0
55+
- openssl=1.1.1l=h7f8727e_0
56+
- packaging=21.3=pyhd3eb1b0_0
57+
- pcre=8.45=h295c915_0
58+
- pillow=8.4.0=py39h5aabda8_0
59+
- pip=21.2.4=py39h06a4308_0
60+
- pyparsing=3.0.4=pyhd3eb1b0_0
61+
- pyqt=5.9.2=py39h2531618_6
62+
- python=3.9.7=h12debd9_1
63+
- python-dateutil=2.8.2=pyhd3eb1b0_0
64+
- qt=5.9.7=h5867ecd_1
65+
- readline=8.1=h27cfd23_0
66+
- scikit-learn=1.0.1=py39h51133e4_0
67+
- scipy=1.7.1=py39h292c36d_2
68+
- setuptools=58.0.4=py39h06a4308_0
69+
- sip=4.19.13=py39h2531618_0
70+
- six=1.16.0=pyhd3eb1b0_0
71+
- sqlite=3.37.0=hc218d9a_0
72+
- threadpoolctl=2.2.0=pyh0d69192_0
73+
- tk=8.6.11=h1ccaba5_0
74+
- tornado=6.1=py39h27cfd23_0
75+
- tzdata=2021e=hda174b7_0
76+
- wheel=0.37.0=pyhd3eb1b0_1
77+
- xz=5.2.5=h7b6447c_0
78+
- zlib=1.2.11=h7f8727e_4
79+
- zstd=1.4.9=haebb681_0
80+
- pip:
81+
- charset-normalizer==2.0.9
82+
- folktables==0.0.11
83+
- idna==3.3
84+
- pandas==1.3.5
85+
- pytz==2021.3
86+
- requests==2.26.0
87+
- sklearn==0.0
88+
- urllib3==1.26.7

0 commit comments

Comments
 (0)