Train data creation:
Labeled Data : Create a folder with name Labeled and copy the 11 train datasets into it.
Unlabled Data : Create a folder with name Unlabeled and copy the unlabeled data into it.
Test Data: Create a folder with name Test and copy the test data into it.
Co_Training_2C
Input: Labeled and Unlabeled data Output: A classier that takes an unlabeled document and predicts a class label
Steps to run the code:
Step1: Upload all the labeled, unlabeled and test data into the google drive and mention the path of the files to the respective variables in the code.
Step2: All the results are written into a text file. Mention the path of the file names.
Step3: Run all the cells in the notebook.
Co_Training_3C
Co-training algorithm with 3 classifiers which are Random Forest, Support Vector Machine and LightGBM.
Input: Labeled and Unlabeled data Output: A classier that takes an unlabeled document and predicts a class label
Steps to run the code:
Step1: Upload all the labeled, unlabeled and test data into the google drive and mention the path of the files to the respective variables in the code.
Step2: All the results are written into a text file. Mention the path of the file names.
Step3: Run all the cells in the notebook.
Input : Labeled and Unlabeled data Output: A best model that can predict the class labels of unlabeled data.
Steps to run the code:
Step1: Upload all the labeled, unlabeled and test data into the google drive and mention the path of the files to the respective variables in the code.
Step2: All the results are written into a text file. Mention the path of the file names.
Step3: Run all the cells in the notebook.
Step1: Upload all the labeled, unlabeled and test data into the google drive and mention the path of the files to the respective variables in the code.
Step2: All the results are written into a text file. Mention the path of the file names.
Step3: Change the variables 'Threshold' and 'unlabel_size_list' accoding to the experiment.
Step4: Run all the cells in the notebook.
All the codes that are related to supervised Learning can be run using the below steps.
Step1: Upload all the labeled and test data into the google drive and mention the path of the files to the respective variables in the code.
Step2: Run all the cells in the notebook.
For Supervised_allData
Comment and uncomment the classifiers according to the experiment.
For example: If you need to run the LightGBM, uncomment the LightGBM classifier and comment the all the other classifiers.