lexiconenrichment

Pipeline script to clean and enrich a word list.

Installation

Clone this repository: $ git clone https://github.com/LanguageMachines/lexiconenrichment
Build the image: $ docker build -t lexiconenrichment lexiconenrichment

Start an interactive shell in a new container: $ docker run -t -i lexiconenrichment.
Invoke clean_run.sh with two parameters: inputfile outputname

The input should be the list of one word per line in the format: word,STATUSCODE The information status codes are numbers

Only words with Statuscode 0 or 10 are included.

The output is word per line, followed by spellchecked version, lemma, compound parts

example output:

binnnenkomststempel,binnenkomststempel,binnenkomststempel,binnen komst stempel

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Dockerfile		Dockerfile
README.md		README.md
clean_run.sh		clean_run.sh
compound_server.conf		compound_server.conf
compoundwords.edited.out		compoundwords.edited.out
install_deps.sh		install_deps.sh
rewriteTiccl.v3.py		rewriteTiccl.v3.py