The implementation of TensorGCN in paper:
Liu X, You X, Zhang X, et al. Tensor graph convolutional networks for text classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34(05): 8409-8416.
The pytorch implementation of this paper: here.
Python 3.6
Tensorflow 1.15
Run TGCN_2layers/build_graph_tgcn.py
Run TGCN_2layers/train.py
-
/data_tgcn/mr/build_train/mr.clean.txt
indicates document names, training/test split, document labels. Each line is for a document. -
/data_tgcn/mr/build_train/mr.txt
contains raw text of each document. -
/data_tgcn/mr/stanford/mr_pair_stan.pkl
contains all syntactic relationship word pairs for the dataset. -
/data_tgcn/mr/build_train/mr_semantic_0.05.pkl
contains all semantic relationship word pairs for the dataset.
we propose a LSTM-based method to construct a semantic-based graph from text documents. There are three main steps:
- Step 1: Train a LSTM on the training data of the given task (e.g. text classification here).
- Step 2: Get semantic features/embeddings with LSTM for all words in each document/sentence of the corpus.
- Step 3: Calculate word-word edge weights based on word semantic embeddings over the corpus.The calculation formula can be found in formula (3) in the paper.
-
Step 1: We utilize stanford CoreNLP parser to extract dependency between words. You can learn how to use the toolkit through this website.
-
Step 2: Change one line of code in "dependency_parse" function :
before:return [(dep['dep'], dep['governor'], dep['dependent']) for s in r_dict['sentences'] for dep in s['basicDependencies']]
after:return [(dep['governorGloss'], dep['dependentGloss']) for s in r_dict['sentences'] for dep in s['basicDependencies']]
-
Step 3: Get syntactic relationship word pairs for the dataset by : Run TGCN1_2layers/get_syntactic_relationship.py.