lm-query

lm-query queries a language model specified in an ARPA file.

Given an ARPA file and some test data, it will output the following:

For each word it outputs a line of the format
[word]=[known word?] [n-gram length] [log10 probability]
- [word] is the current word.
- [known word?] is 0 for unseen words (OOVs) and 1 if the word was in the training set.
- [n-gram length] is the length of the n-gram that was found for this word. That is how many words of the history (including the word itself) could be used to determine the probability.
- [log10 probability] is the log 10 probability that was assigned for this word to appear in this context by the language model.
At the end of each sentence the total log 10 probability of the sentence is given, as well as the number of OOV words.
At the end of the querying process, the program will output the following to stderr:
- Perplexity, inlcluding and excluding OOVs. Excluding OOVs means that unknown words are simply ignored.
- Number of OOVs in the test data.
- Number of tokens (words) in the test data.
- The execution time for querying the language model.

Requirements

Python 3.x

Installation

For Windows: You can download the zip folder from github and unzip the file. Alternatively, you can download the program via GitHub Windows.

For Linux: In the terminal run git clone https://github.com/benti/lm-query.git. Alternatively you can download the zip folder from github and unzip the file.

How to use it

Make sure the test data file seperates words by spaces and sentences by line breaks.

For Windows: open the command prompt, go to lm-query folder, run

python bin\lm-query.py [arpa file] < [test data file] > [probability output file] 2> [perplexity output file]

For Linux: In the terminal run

lm-query/bin/lm-query.py [arpa file] < [test data file] > [probability output file] 2> [perplexity output file]

Licence

The MIT License (MIT)

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
bin		bin
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lm-query

Requirements

Installation

How to use it

Licence

About

Releases

Packages

Contributors 2

Languages

License

abentkamp/lm-query

Folders and files

Latest commit

History

Repository files navigation

lm-query

Requirements

Installation

How to use it

Licence

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages