capstone-project

clone the repository
on terminal: cd capstone-project
create the conda environment from the yml file: conda env create -f environment.yml
activate the environment: conda activate final-project
preprocess the available data and create useful datasets: python createDataset.py
create label distribution plots: python visuals.py
run experiments with logistic regression model: python logisticRegression.py
run experiments with xgboost model: python xgb.py
run experiments with neural networks model: python nn.py
run experiments with all models but with leave one out cross validation: python leaveOneOut.py, however this takes a while due to the nature of LOOCV method
run the model explainer to see which features matter how much: python explainer.py
run the inference to get top n similar samples given the new samples: python inference.py, remove the break in the loop if you want the results for all new samples

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
data		data
figures		figures
saved-models		saved-models
LICENSE.md		LICENSE.md
PetDBSpecimenData.csv		PetDBSpecimenData.csv
PetdbCitationsReferencingTheSameSpecimens.csv		PetdbCitationsReferencingTheSameSpecimens.csv
PetdbSpecimensGreaterThan1Citation-Corrected.csv		PetdbSpecimensGreaterThan1Citation-Corrected.csv
PetdbSpecimensGreaterThan1Citation-Corrected.xlsx		PetdbSpecimensGreaterThan1Citation-Corrected.xlsx
README.md		README.md
createDataset.py		createDataset.py
data exploration.ipynb		data exploration.ipynb
environment.yml		environment.yml
explainer.py		explainer.py
iSamples Sets.xlsx		iSamples Sets.xlsx
iSamples_modified.xlsx		iSamples_modified.xlsx
inference.py		inference.py
leaveOneOut.py		leaveOneOut.py
logisticRegression.py		logisticRegression.py
nn.py		nn.py
utilities.py		utilities.py
utils.py		utils.py
visuals.py		visuals.py
xgb.py		xgb.py

Provide feedback