A multi-source synonym discovery tool with frequency-band filtering. Combines WordNet and fastText to surface candidates across the full Zipf range. Pick a corpus, pick a frequency band, get synonyms.
Live at synonymicon.xyz.
- Claude Opus 4.5
- Claude Opus 4.6
- Claude Opus 4.7
- Perplexity Computer
- Xiaomi MiMo-V2-Pro
- Xiaomi MiMo-V2.5-Pro
- MiniMax M2.7
- KAT-Coder-Pro V2
- Python 3.12 + Flask (synchronous, single-process, no database)
- wordfreq for default frequency
- NLTK WordNet for primary synonyms
- fastText (
fasttext-wiki-news-subwords-300via gensim) for secondary candidates - Included frequency corpora: wordfreq, SUBTLEX-US, BNC, Google 1-grams, Wikipedia, Kaggle, OpenSubtitles, Project Gutenberg, Leipzig News 2025, Leipzig Web COM 2018, Leipzig Web UK 2018
- Definition fallback chain: Wiktionary REST API → Webster's 1913 (local) → WordNet gloss →
[undefined] - Vanilla single-page frontend (no build step, no framework)
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python scripts/setup_nltk.pyThe fastText model (~1 GB) downloads on first run via gensim and is cached under ~/gensim-data/.
Set SYNONYMICON_FASTTEXT=0 to skip the fastText load entirely (WordNet-only candidates, instant startup) — useful for development and required by the test suite (SYNONYMICON_FASTTEXT=0 pytest).
flask run --no-reloadUse --no-reload because fastText loads at module scope and the reloader would spawn two processes that both load it. Startup takes ~2.5–3 minutes.
Server on localhost:5000.
gunicorn -w 1 -t 120 -b 127.0.0.1:5000 app:app-w 1(one worker) is intentional; each worker loads ~1.5 GB of model + corpus data.-t 120keeps gunicorn from killing the worker during the long startup.- Run behind a reverse proxy (nginx, Caddy) for TLS.
- Resident memory: ~1.5–2 GB (fastText ~1 GB, corpora ~200 MB, runtime).
- Cold start: ~2.5–3 minutes.
- Not compatible with serverless or sleep-on-idle hosting.
GET /synonyms?word=<x>&tier=<t>&pos=<p>&corpus=<c>&rank=<r>
Returns a JSON object:
{
"senses": [{"id": "happy.a.01", "gloss": "...", "pos": "adj"}],
"query_in_corpus": true,
"results": [{"word": "...", "zipf": 3.4, "definition": "...", "band": "uncommon"}]
}senses is WordNet sense metadata for the query (capped at 8, filtered by pos, empty for 2-word phrases); results is the blended, sorted, frequency-filtered candidate list. query_in_corpus indicates whether the queried word exists in the selected corpus's frequency table (null for phrases).
| Param | Values |
|---|---|
word |
required; up to 2 words for phrase queries |
tier |
all, common, uncommon, rare, exotic, absurd (or comma-separated) |
pos |
all, noun, verb, adj, adv (or comma-separated) |
corpus |
wordfreq (default), subtlex, bnc, google_1grams, wikipedia, kaggle, opensubtitles, gutenberg, leipzig_news, leipzig_web_com, leipzig_web_uk |
rank |
common (default), rare, relevance — result sort order |
min, max |
optional Zipf floats; raw API-only override of tier (no UI surface) |
app.py Flask app: routes, validation, response assembly
config.py Constants (tiers, valid params, scoring/limits)
candidates.py WordNet + fastText candidates, scoring, senses, bands
corpora.py Corpus loaders (count aggregation) + get_zipf dispatch
definitions.py Wiktionary -> Webster's -> WordNet -> [undefined] chain
data/ Corpus files + Webster's 1913
static/index.html Single-page frontend (HTML + inline CSS + inline JS)
scripts/setup_nltk.py One-time NLTK data download
requirements.txt Pinned dependencies
CLAUDE.md Architecture and design rationale
MIT — see LICENSE.
Frequency corpora are credited in-app under the "corpora" link in the footer.