Automatically find and fix spelling errors in your READMEs and other documents that you write without a word processor. spell-check is a fast command line application to spell-check large text files (books, Github files, assignments etc.) and autocorrect misspelled words based on a probabilistic model. The program is optimized for speed and can check over 1 million words in less than 1 second.
$ wget -P ~/Downloads https://github.com/madhav-datt/spell-check/archive/v2.0.zip
$ unzip ~/Downloads/spell-check-2.0.zip
$ mv ~/Downloads/spell-check-2.0 ~/Downloads/spell-check
$ chmod +x spell-check/install
$ sudo spell-check/install
$ spellcheck /path/to/file/file_to_be_checked
The program supports spellchecking and auto-correct for txt files and PDF files. You could also batch process multiple files inside a directory.
The program will output a list of all the misspelled words along with suggested corrections, and file checking benchmarks.
oficiel
, which was intended to be official
has no suggested correction because it has an edit distance of more than 1 from a correctly spelled word. Read more about this here.
spell-check
will fix spelling errors due to missing spaces using a segmentation algorithm. Read more here.
Both speed and accuracy benchmarks give an approximate value that has been averaged over multiple input text files and documents.
Optimized for speed - can spellcheck over 1 Million words in less than 1 second.
Misspelled Words | Words in Dictionary | Words in Text Document | Time in Loading Data |
---|---|---|---|
122.5 | 368895 | 10237257.5 | 0.34 seconds |
Time in Checking Text | Time in Correcting Text | Time in Unloading Data | Total Time |
---|---|---|---|
0.56 seconds | 0.01 seconds | 0.15 seconds | 0.98 seconds |
Calculated on inputs from Roger Mitton's Birkbeck spelling error corpus from the Oxford Text Archive. On a development set of 250 test cases (including context based mistakes for correctly spelled words) the spell-check program has an accuracy of around 66 % and close to 80 % for misspelled words with an edit distance equal to one.
Read about the data, sources, processing raw word data, word frequency, probabilistic model for word correction etc. here.
-
No context based/grammar checking -
Their is nothing to be done here.
will be treated as a correct sentence and not be changed to
There is nothing to be done here.
-
Words with edit distances greater than 1 cannot be corrected -
oficiel
, won't be corrected toofficial
. -
Doesn't fix spelling errors due to missing spaces.historicaldatawill be found as misspelled, but won't be corrected tohistorical data -
Please report bugs and issues here.