Text-to-Disease-Classification

Formulated a large Language Model (LLM) by fine tuning pre-trained BERT transformer ,that can accurately classify disease name based on short description of symptoms.

Deployed this system as a Flask-based web application, ensuring seamless accessibility and user-friendly interaction.

Installation

Create a conda environment using environment file , activate that environment and the run the app.py file. if run sucessfully, you should be able to access the web app at localhost.

Example:-

# clone repo
git clone git@github.com:0x1h0b/Text-to-Disease-Classification.git

# create environment
conda env create --name <envname> --file=environments.yml

# activate env
conda activate <envname>

# run app.py
python app.py

Data

Used below kaggle dataset to fine tune the BERT model

Kaggle:// disease-symptom-description-dataset

notes :-

-  Data is equally Balanced, 120 symptom lists/entries for each 43 disease.
-  all disease have corresponding description and precautions
-  visualised the count plot of individual symptom.
    -  type of symptoms are more balanced and diverse in symptom1 and symptom2 feature column.
-  removed all chars other then aplha-numeric. removed extra spaces.

Training & Evaluation

-  Used OOP by defining custom Dataset and Model class to load dataset and fine tune BERT model.
-  Used K fold cross Validation for Training and evaluation of model.
-  Trained the model for 3 epochs for 5 folds

notes :-

- FOLD 1 , Epoch: 2 ,loss : 3.163252592086792 , accuracy :0.991869918699187
- FOLD 2 , Epoch: 2 ,loss : 3.164623737335205 , accuracy :0.9956808943089431
- FOLD 3 , Epoch: 2 ,loss : 3.1687734127044678 , accuracy :0.9852642276422764
- FOLD 4 , Epoch: 2 ,loss : 3.1720407009124756 , accuracy :0.9933943089430894
- FOLD 5 , Epoch: 2 ,loss : 3.1956565380096436 , accuracy :0.9784044715447154

Final Results

when you enter the top 7 symptoms in the flask web app , it will display the top 5 disease based on the symptoms.

Here is the sample result from flask ,

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Notebook		Notebook
data		data
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
dataset.py		dataset.py
environment.yml		environment.yml
inference.py		inference.py
label_enc_classes.npy		label_enc_classes.npy
model.py		model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-to-Disease-Classification

Installation

Data

notes :-

Training & Evaluation

notes :-

Final Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text-to-Disease-Classification

Installation

Data

notes :-

Training & Evaluation

notes :-

Final Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages