The BIP! Citation Classifier is a comprehensive Python library designed to classify citations based on their intent, utilizing a range of state-of-the-art algorithms. This tool utilises the citation context found within scientific publications, analysing the text surrounding a reference to determine the intent (or purpose) behind the citation. By leveraging a well-established citation classification ontology, the library categorises citations into specific classes, such as whether a citation supports, uses, or extends the work being cited. The outputs of the BIP! Citation Classifier are particularly useful for tasks such as citation network analysis, where understanding the nature of each citation can significantly improve the accuracy of various analyses.
This project implements various text mining techniques based on neural networks, focusing on citation intent classification at different semantic levels. It also includes the modification and calculation of Relative Performance Indicators (RPIs) to observe how they are influenced by citation intent. All code is run on Google Colab.
Six folders need to be created to store datasets, notebooks, and results of the Zero-Shot Classification Models.
- Folder 1 – ACT:
Dataset: datasets/ACL_ATC/ATC/train.csv
Notebook: Inference_ZeroShotClassification/ZeroShotClassification_ACL_ATC_Classes6.ipynb
Contents: Inference results from four ZeroShotClassification models will be stored as .csv files. Model performance will be documented in the notebook.
- Folder 2 – ACT_INFLUENCE:
Dataset: datasets/ACL_ATC/ATC_INFLUENCE/train.csv
Notebook: Inference_ZeroShotClassification/ZeroShotClassification_ACL_ATC_Classes2.ipynb
Contents: Inference results from four ZeroShotClassification models stored as .csv files. Model performance will be documented in the notebook.
- Folder 3 – SciCite_Model1:
Datasets: datasets/SciCite/ATC/train.csv datasets/SciCite/ATC/dev.csv datasets/SciCite/ATC/test.csv
Notebook: Inference_ZeroShotClassification/ZeroShotClassification_SciCite_model1.ipynb
Contents: Model1 inference results stored as .csv files for train, dev, and test datasets. Model performance will be presented in the notebook.
- Folder 4 – SciCite_Model2:
Datasets: datasets/SciCite/ATC/train.csv datasets/SciCite/ATC/dev.csv datasets/SciCite/ATC/test.csv
Notebook: Inference_ZeroShotClassification/ZeroShotClassification_SciCite_model2.ipynb
Contents: Model2 inference results stored as .csv files for train, dev, and test datasets. Model performance will be presented in the notebook.
- Folder 5 – SciCite_Model3:
Datasets: datasets/SciCite/ATC/train.csv datasets/SciCite/ATC/dev.csv datasets/SciCite/ATC/test.csv
Notebook: Inference_ZeroShotClassification/ZeroShotClassification_SciCite_model3.ipynb
Contents: Model3 inference results stored as .csv files for train, dev, and test datasets. Model performance will be presented in the notebook.
- Folder 6 – SciCite_Model4:
Datasets: datasets/SciCite/ATC/train.csv datasets/SciCite/ATC/dev.csv datasets/SciCite/ATC/test.csv
Notebook: Inference_ZeroShotClassification/ZeroShotClassification_SciCite_model4.ipynb
Contents: Model4 inference results stored as .csv files for train, dev, and test datasets. Model performance will be presented in the notebook.
The SciBERT model is reproduced using PyTorch, following the study’s guidelines. This includes citation intent classification with three and four labeled classes to evaluate the model's ability to capture more granular semantic intent.
Folder Name: SciBERT_classes3
Datasets: datasets/SciCite/ATC/train.csv datasets/SciCite/ATC/dev.csv datasets/SciCite/ATC/test.csv
Notebook: SciBERT_Reproduction/SciBERT_Reproduction_3Classes.ipynb
Contents: Model checkpoints will be stored to allow the best model to be used for inference after validation.
Folder Name: SciBERT_classes4
Datasets: datasets/SciCite/train.csv datasets/SciCite/dev.csv datasets/SciCite/test.csv
Notebook: SciBERT_Reproduction/SciBERT_4Classes.ipynb
Contents: Model checkpoints will be stored for inference after validation.
The RPIs (Relative Performance Indicators) will be calculated based on the citation intent semantics.
Folder Name: RPIs
Datasets: datasets/SciCite/train.csv datasets/SciCite/dev.csv datasets/SciCite/test.csv
Notebook: Citation_Intent_in_RPIs/RPIs.ipynb Contents: This notebook will calculate RPIs based on the semantics of citation intent.
To replicate the results:
Clone this repository.
Download the datasets from the mentioned paths.
Run the notebooks in Google Colab using VG100 GPU or your local environment.
Make sure to install all required dependencies listed in each notebook before running them.