Synonymicon

A multi-source synonym discovery tool with frequency-band filtering. Combines WordNet and fastText to surface candidates across the full Zipf range. Pick a corpus, pick a frequency band, get synonyms.

Live at synonymicon.xyz.

Built with assistance from

Claude Opus 4.5
Claude Opus 4.6
Claude Opus 4.7
Perplexity Computer
Xiaomi MiMo-V2-Pro
Xiaomi MiMo-V2.5-Pro
MiniMax M2.7
KAT-Coder-Pro V2

Stack

Python 3.12 + Flask (synchronous, single-process, no database)
wordfreq for default frequency
NLTK WordNet for primary synonyms
fastText (fasttext-wiki-news-subwords-300 via gensim) for secondary candidates
Included frequency corpora: wordfreq, SUBTLEX-US, BNC, Google 1-grams, Wikipedia, Kaggle, OpenSubtitles, Project Gutenberg, Leipzig News 2025, Leipzig Web COM 2018, Leipzig Web UK 2018
Definition fallback chain: Wiktionary REST API → Webster's 1913 (local) → WordNet gloss → [undefined]
Vanilla single-page frontend (no build step, no framework)

Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python scripts/setup_nltk.py

The fastText model (~1 GB) downloads on first run via gensim and is cached under ~/gensim-data/.

Set SYNONYMICON_FASTTEXT=0 to skip the fastText load entirely (WordNet-only candidates, instant startup) — useful for development and required by the test suite (SYNONYMICON_FASTTEXT=0 pytest).

Run (development)

flask run --no-reload

Use --no-reload because fastText loads at module scope and the reloader would spawn two processes that both load it. Startup takes ~2.5–3 minutes.

Server on localhost:5000.

Run (production)

gunicorn -w 1 -t 120 -b 127.0.0.1:5000 app:app

-w 1 (one worker) is intentional; each worker loads ~1.5 GB of model + corpus data.
-t 120 keeps gunicorn from killing the worker during the long startup.
Run behind a reverse proxy (nginx, Caddy) for TLS.

Memory & startup

Resident memory: ~1.5–2 GB (fastText ~1 GB, corpora ~200 MB, runtime).
Cold start: ~2.5–3 minutes.
Not compatible with serverless or sleep-on-idle hosting.

API

GET /synonyms?word=<x>&tier=<t>&pos=<p>&corpus=<c>&rank=<r>

Returns a JSON object:

{
  "senses":          [{"id": "happy.a.01", "gloss": "...", "pos": "adj"}],
  "query_in_corpus": true,
  "results":         [{"word": "...", "zipf": 3.4, "definition": "...", "band": "uncommon"}]
}

senses is WordNet sense metadata for the query (capped at 8, filtered by pos, empty for 2-word phrases); results is the blended, sorted, frequency-filtered candidate list. query_in_corpus indicates whether the queried word exists in the selected corpus's frequency table (null for phrases).

Param	Values
`word`	required; up to 2 words for phrase queries
`tier`	`all`, `common`, `uncommon`, `rare`, `exotic`, `absurd` (or comma-separated)
`pos`	`all`, `noun`, `verb`, `adj`, `adv` (or comma-separated)
`corpus`	`wordfreq` (default), `subtlex`, `bnc`, `google_1grams`, `wikipedia`, `kaggle`, `opensubtitles`, `gutenberg`, `leipzig_news`, `leipzig_web_com`, `leipzig_web_uk`
`rank`	`common` (default), `rare`, `relevance` — result sort order
`min`, `max`	optional Zipf floats; raw API-only override of `tier` (no UI surface)

Layout

app.py                  Flask app: routes, validation, response assembly
config.py               Constants (tiers, valid params, scoring/limits)
candidates.py           WordNet + fastText candidates, scoring, senses, bands
corpora.py              Corpus loaders (count aggregation) + get_zipf dispatch
definitions.py          Wiktionary -> Webster's -> WordNet -> [undefined] chain
data/                   Corpus files + Webster's 1913
static/index.html       Single-page frontend (HTML + inline CSS + inline JS)
scripts/setup_nltk.py   One-time NLTK data download
requirements.txt        Pinned dependencies
CLAUDE.md               Architecture and design rationale

License

MIT — see LICENSE.

Credits

Frequency corpora are credited in-app under the "corpora" link in the footer.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
audit-skill		audit-skill
data		data
scripts		scripts
static		static
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
AUDIT_2026-05-26.md		AUDIT_2026-05-26.md
AUDIT_2026-05-31.md		AUDIT_2026-05-31.md
AUDIT_2026-06-02.md		AUDIT_2026-06-02.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
candidates.py		candidates.py
config.py		config.py
corpora.py		corpora.py
definitions.py		definitions.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synonymicon

Built with assistance from

Stack

Setup

Run (development)

Run (production)

Memory & startup

API

Layout

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Synonymicon

Built with assistance from

Stack

Setup

Run (development)

Run (production)

Memory & startup

API

Layout

License

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages