python aliasgen.py --config aliasgen.yaml --gui
python aliasgen.py --config aliasgen.yaml --themes orbit flux ember -n 30 --seed 7
AliasGen + Local Word Vectors (no LLM)
This project generates aliases from user-provided theme words.
It supports: • Local word vectors (vectors.kv) trained from your own text (offline, private) • Optional gensim downloader models (internet + large download) • A small Tkinter GUI inside aliasgen.py
No LLMs are used.
⸻
Requirements
Python 3.10+
Install dependencies:
pip install numpy pyyaml gensim
Tkinter is usually included with Python. On some Linux systems:
sudo apt install python3-tk
⸻
Files
aliasgen.py Alias generator (CLI + GUI). Reads configuration from YAML. Can use local or downloaded embeddings.
train_vectors.py Trains FastText word vectors locally from text files. Can also extract a vocabulary and update aliasgen.yaml.
aliasgen.yaml Configuration file (vocab, styling, embedding settings).
vectors.kv Local word vectors generated by train_vectors.py.
⸻
Quick Start (offline, local vectors) 1. Prepare text data
Place any text files into a folder, e.g.:
mytexts/
The trainer scans common text/code file types automatically. 2. Train vectors + extract vocabulary python train_vectors.py mytexts/ –extract-vocab –train –update-alias-yaml aliasgen.yaml
This will: • build a corpus from your text • extract frequent vocabulary • update aliasgen.yaml (creates backup) • write vectors.kv
3. Configure aliasgen.yaml
Ensure these fields exist:
use_embeddings: true embedding_model: "local" vectors_path: "vectors.kv"
4. Run generator
CLI:
python aliasgen.py --config aliasgen.yaml --themes alpha orbit forge -n 30
GUI:
python aliasgen.py --config aliasgen.yaml --gui
⸻
Training Options
Extract vocabulary only:
python train_vectors.py mytexts/ --extract-vocab --vocab-out vocab.yaml
Train vectors only:
python train_vectors.py mytexts/ --train --vectors-out vectors.kv
Quick test with limited data:
python train_vectors.py mytexts/ --max-lines 20000 --extract-vocab --train
Tune training parameters:
python train_vectors.py mytexts/ --train --dim 200 --window 7 --epochs 30 --min-count-train 3
⸻
How It Works 1. Text corpus is read from files. 2. Tokens are extracted (words). 3. FastText learns word vectors. 4. aliasgen.py mixes theme words in vector space. 5. Similar words are selected from vocab. 6. Aliases are composed (mash, prefix, suffix, style).
⸻
Why FastText
FastText uses subword information. This allows: • better handling of rare words • handling of new invented words • robust theme mixing
⸻
Improving Results
More text → better vectors.
Best results when: • corpus matches your domain • theme words appear in corpus • consistent language
If a theme word is unknown: add it to your corpus and retrain.
⸻
Privacy / Sharing
To share the tool safely:
Share: • aliasgen.py • train_vectors.py • generic aliasgen.yaml
Do NOT share: • private corpus text • vectors.kv derived from private data
Vectors can encode semantic information from source text.