Semantic Extraction

Requirements

Execution of this project requires the use of the Python Development Master (pdm). Please install pdm before continuing. See installation instructions here.

Running the code

Running the code is as simple as running some make commands

$ make init
You can now run any of the train, validate, or test-emerging targets

Training will take some time. But you can run the first couple of training iterations if you would like to test that.

validate and test-emerging will work right away because of the inclusion of the pre-trained models.

Make targets

Most of the major methods of execution are covered by targets within a Makefile.

make download: Download datasets and trained models
make install: Install dependencies with pdm
make init: Run download and install targets
make train: Train month-to-month word2vec models
make validate: Run validation script on all channels trained model
make test-emerging: Test emerging terms script between two models

All scripts can be run with pdm run python <script.py> --help to see options for running.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
semantic_extraction		semantic_extraction
tests		tests
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
Makefile		Makefile
README.md		README.md
create_datasets.sql		create_datasets.sql
fraud_terms		fraud_terms
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml
train-pua-channels.sh		train-pua-channels.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Extraction

Requirements

Running the code

Make targets

NOTE: training on the full dataset consumed over 110 GB of ram. And took 7+ hours

Authors: Jonathan Frees ([email protected]), Jan Foksinski ([email protected])

About

Releases

Packages

Languages

jmfrees/nlp-emerging-terms

Folders and files

Latest commit

History

Repository files navigation

Semantic Extraction

Requirements

Running the code

Make targets

NOTE: training on the full dataset consumed over 110 GB of ram. And took 7+ hours

Authors: Jonathan Frees ([email protected]), Jan Foksinski ([email protected])

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages