Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
pierotofy committed Oct 30, 2023
1 parent 701084d commit 89287ad
Show file tree
Hide file tree
Showing 54 changed files with 412,017 additions and 1 deletion.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,5 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

lexilang/data/*
84 changes: 83 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,83 @@
# DictLang
# LexiLang

Simple, fast dictionary-based language detector for short texts.

## Installation

```bash
pip install lexilang
```

## Usage

```python
from lexilang.detector import detect

print(detect("bonjour")) # ('fr', 0.45)
print(detect("学中文")) # ('zh', 0.45)
print(detect("ciao mondo")) # ('it', 0.9)
print(detect("El gato doméstico")) # ('es', 0.45)

# Optionally, specify a subset of languages to consider
print(detect("ciao", languages=["de", "ro"])) # ('de', 0.45)
```

`detect(text, languages=[])` -> tuple (`iso_639_1`, `confidence`)

## Supported Languages

* Afrikaans
* Albanian
* Arabic
* Bengali
* Bulgarian
* Catalan
* Chinese
* Czech
* Danish
* Dutch
* English
* Esperanto
* Estonian
* Finnish
* French
* German
* Greek
* Hebrew
* Hindi
* Hungarian
* Indonesian
* Italian
* Japanese
* Kazakh
* Korean
* Latvian
* Lithuanian
* Macedonian
* Norwegian
* Polish
* Portuguese
* Romanian
* Russian
* Serbian
* Slovak
* Slovenian
* Spanish
* Swedish
* Thai
* Turkish
* Ukrainian
* Vietnamese
* Farsi

## Limitations

This detector was designed for handling small texts (< 20 characters). It will probably not work reliably for longer text sequences. As it relies on dictionaries, if a word is missing or mispelled, the detection will fail.

## Contributing

If you want to add a new language, or improve an existing one, add more words to the respective dictionary in the `dictionaries` folder.

## License

AGPLv3
Loading

0 comments on commit 89287ad

Please sign in to comment.