HunspellPy

A wrapper and REST API implemented in Python for Hunspell spellchecker and morphological analyzer

WARNING: Hunspell 1.6.2 (Ubuntu 18.04 Bionic) is broken! It yields UnicodeDecodeError occasionally and messes up the analyses of compound words!

Requirements

libhunspell-dev: On Ubuntu 18.04 LTS or higher just sudo apt install libhunspell-dev
A dictionary package, eg. hunspell-hu: On Ubuntu 18.04 LTS or higher just sudo apt install hunspell-hu (or use the included version)
Python 3 (>=3.5, tested with 3.6)
Pip to install the additional requirements in requirements.txt
(Optional) a cloud service like Heroku for hosting the API

Features

Spellchecking, stemming and returning the detailed morphological analyses with the proper .dic and .aff files
Handling of removing individual words from the dictionary (if a word is added with affixes affixed words could not be removed)
Can be used through REST API, or from Python as a library (see usage examples below)

Install on local machine

Clone the repository
Run: sudo pip3 install -r requirements.txt
Use from Python

Install to Heroku

WARNING: Heroku free tier is discontinued.

Register
Download Heroku CLI
Login to Heroku from the CLI
Create an app
Clone the repository
Add Heroku as remote origin
Add APT buildpack: heroku buildpacks:add --index 1 https://github.com/heroku/heroku-buildpack-apt
Add Python buildpack: heroku buildpacks:add --index 2 heroku/python
Push the repository to Heroku
Enjoy!

Build docker image and run as container

docker build . -t hunspellpyRESTAPI
docker run -p8000:80 hunspellpyRESTAPI
Rest API is running on: http://localhost
To use with https://render.com one must define the environment variable PORT to 8000

Usage

From browser or anyhow through the REST API:

Lemmatization: https://hunspellpy.onrender.com/stem/működik
Detailed analysis: https://hunspellpy.onrender.com/analyze/működik
Spellcheck: https://hunspellpy.onrender.com/spell/működik
Lemmatisation with the corresponding detailed analysis: https://hunspellpy.onrender.com/dstem/működik
The library also support HTTP POST requests to handle multiple words at once. (See examples for details.)

>>> import requests
>>> import json
>>> word = 'működik'
>>> json.loads(requests.get('https://hunspellpy.onrender.com/spell/' + word).text)[word]
{'spell': True}
>>> json.loads(requests.get('https://hunspellpy.onrender.com/stem/' + word).text)[word]
{'stem': ['működik']}
>>> json.loads(requests.get('https://hunspellpy.onrender.com/analyze/' + word).text)[word]
{'anas': [[['st', 'működik'], ['po', 'vrb'], ['ts', 'PRES_INDIC_INDEF_SG_3']]]}
>>> json.loads(requests.get('https://hunspellpy.onrender.com/dstem/' + word).text)[word]
{'stem': ['működik'], 'anas': [[['st', 'működik'], ['po', 'vrb'], ['ts', 'PRES_INDIC_INDEF_SG_3']]], 'spell': True}
>>> words = '\n'.join(('form', word, 'word2', ''))  # One word per line (first line is header, trailing newline is needed!)
>>> words_out = requests.post('https://hunspellpy.onrender.com/spell', files={'file': words}).text.split('\n')
>>> print(words_out[1].split('\t'))
['működik', '{"spell": true}']
>>> words_out = requests.post('https://hunspellpy.onrender.com/stem', files={'file': words}).text.split('\n')
>>> print(words_out[1].split('\t'))
['működik', '{"stem": ["működik"]}']
>>> words_out = requests.post('https://hunspellpy.onrender.com/analyze', files={'file': words}).text.split('\n')
>>> print(words_out[1].split('\t'))
['működik', '{"anas": [[["st", "működik"], ["po", "vrb"], ["ts", "PRES_INDIC_INDEF_SG_3"]]]}']
>>> words_out = requests.post('https://hunspellpy.onrender.com/dstem', files={'file': words}).text.split('\n')
>>> print(words_out[1].split('\t'))
['működik', 'true', '{"stem": ["működik"], "anas": [[["st", "működik"], ["po", "vrb"], ["ts", "PRES_INDIC_INDEF_SG_3"]]]}']

From Python:

>>> import hunspellpy.hunspellpy as hunspell
>>> h = hunspell.HunspellPy()
>>> h.spell('működik')    # Returns if word spelleed corretly or not?
True
>>> h.stem('működik')     # Returns list of stem
['működik']
>>> h.analyze('működik')  # Returns list of detailed analyzes
[' st:működik po:vrb ts:PRES_INDIC_INDEF_SG_3']
>>> h.dstem('működik')    # Returns list of lemmatisations with the corresponding detailed analyzes (stem, tag and detailed analyzes triples)
{'anas': [[('st', 'működik'), ('po', 'vrb'), ('ts', 'PRES_INDIC_INDEF_SG_3')]], 'stem': ['működik'], 'spell': True}
>>> # Add new word to the lexicon
>>> h.add('működik')
>>> # Add new word with paradigm (from example) to the lexicon
>>> h.add('zsíííír', example='zsír')
>>> # Remove word from the lexicon (with paradigm)
>>> h.remove('zsíííír')
>>> # Remove word from the runtime (!) lexicon (without paradigm)
>>> h.blacklist('zsíííír')
>>> h.generate('körte', example='almával')
['körtéjével', 'körtével']
>>> h.generate('dolgozhat', flags=' st:működik po:vrb ts:PRES_INDIC_INDEF_SG_3')
['dolgoz', 'dolgozik', 'dolgoz']
>>> h.suggest('amla')
['alma', 'ama', 'akla', 'ampa', 'aula', 'pamlag']
>>> h.add_dic('.../extra.dic')
>>> h.get_dic_encoding()
'UTF-8'

From CLI:

```bash
$ python3 -m hunspellpy --raw  # Interactive mode (currently dstem mode only)
Type one word per line, Ctrl+D or empty word to exit
--> működik
működik	true	['működik']	[[('st', 'működik'), ('po', 'vrb'), ('ts', 'PRES_INDIC_INDEF_SG_3')]]
$ python3 -m hunspellpy --raw -i input.txt  # Batch mode
működik	true	['működik']	[[('st', 'működik'), ('po', 'vrb'), ('ts', 'PRES_INDIC_INDEF_SG_3')]]

a	true	['a']	[[('st', 'a'), ('po', 'noun'), ('ts', 'NOM'), ('al', 'a-vá'), ('al', 'a-val'), ('al', 'a-'), ('al', 'A-s')], [('st', 'a'), ('po', 'det_def'), ('al', 'a-vá'), ('al', 'a-val'), ('al', 'a-'), ('al', 'A-s')]]

program	true	['program']	[[('st', 'program'), ('po', 'noun'), ('ts', 'NOM')]]

```

License

This Python wrapper (around pyhunspell package) is licensed under the LGPL 3.0 license. xtsv, PyHunspell, Hunspell and the dictionaries has their own license.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
docs		docs
hunspellpy		hunspellpy
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.travis.yml		.travis.yml
Aptfile		Aptfile
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HunspellPy

Requirements

Features

Install on local machine

Install to Heroku

Build docker image and run as container

Usage

License

About

Releases 1

Packages

Languages

License

nytud/hunspellpy

Folders and files

Latest commit

History

Repository files navigation

HunspellPy

Requirements

Features

Install on local machine

Install to Heroku

Build docker image and run as container

Usage

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages