This notebook demonstrates fine tuning pretrained models from Hugging Face using text classification datasets from the Hugging Face Datasets catalog or a custom dataset. The IMDb Larget Movie Review dataset is used from the Hugging Face Datasets catalog, and the SMS Spam Collection dataset is used as an example of a custom dataset being loaded from a csv file.
The notebook uses Intel® Extension for PyTorch* which extends PyTorch with optimizations for extra performance boost on Intel hardware.
The notebook performs the following steps:
- Import dependencies and setup parameters
- Prepare the dataset
- Prepare the Model for Fine Tuning and Evaluation
- Export the model
- Reload the model and make predictions
To run the notebook, follow the instructions to setup the PyTorch notebook environment.
Dataset Citations
@InProceedings{maas-EtAl:2011:ACL-HLT2011,
author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},
title = {Learning Word Vectors for Sentiment Analysis},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
year = {2011},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {142--150},
url = {http://www.aclweb.org/anthology/P11-1015}
}
@misc{misc_sms_spam_collection_228,
author = {Almeida, Tiago},
title = {{SMS Spam Collection}},
year = {2012},
howpublished = {UCI Machine Learning Repository}
}
Please see this dataset's applicable license for terms and conditions. Intel Corporation does not own the rights to this data set and does not confer any rights to it.