Skip to content

Steiynbrodt/Alias-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

python aliasgen.py --config aliasgen.yaml --gui

python aliasgen.py --config aliasgen.yaml --themes orbit flux ember -n 30 --seed 7

AliasGen + Local Word Vectors (no LLM)

This project generates aliases from user-provided theme words.

It supports: • Local word vectors (vectors.kv) trained from your own text (offline, private) • Optional gensim downloader models (internet + large download) • A small Tkinter GUI inside aliasgen.py

No LLMs are used.

Requirements

Python 3.10+

Install dependencies:

pip install numpy pyyaml gensim

Tkinter is usually included with Python. On some Linux systems:

sudo apt install python3-tk

Files

aliasgen.py Alias generator (CLI + GUI). Reads configuration from YAML. Can use local or downloaded embeddings.

train_vectors.py Trains FastText word vectors locally from text files. Can also extract a vocabulary and update aliasgen.yaml.

aliasgen.yaml Configuration file (vocab, styling, embedding settings).

vectors.kv Local word vectors generated by train_vectors.py.

Quick Start (offline, local vectors) 1. Prepare text data

Place any text files into a folder, e.g.:

mytexts/

The trainer scans common text/code file types automatically. 2. Train vectors + extract vocabulary python train_vectors.py mytexts/ –extract-vocab –train –update-alias-yaml aliasgen.yaml

This will: • build a corpus from your text • extract frequent vocabulary • update aliasgen.yaml (creates backup) • write vectors.kv

3.	Configure aliasgen.yaml

Ensure these fields exist:

use_embeddings: true embedding_model: "local" vectors_path: "vectors.kv"

4.	Run generator

CLI:

python aliasgen.py --config aliasgen.yaml --themes alpha orbit forge -n 30

GUI:

python aliasgen.py --config aliasgen.yaml --gui

Training Options

Extract vocabulary only:

python train_vectors.py mytexts/ --extract-vocab --vocab-out vocab.yaml

Train vectors only:

python train_vectors.py mytexts/ --train --vectors-out vectors.kv

Quick test with limited data:

python train_vectors.py mytexts/ --max-lines 20000 --extract-vocab --train

Tune training parameters:

python train_vectors.py mytexts/ --train --dim 200 --window 7 --epochs 30 --min-count-train 3

How It Works 1. Text corpus is read from files. 2. Tokens are extracted (words). 3. FastText learns word vectors. 4. aliasgen.py mixes theme words in vector space. 5. Similar words are selected from vocab. 6. Aliases are composed (mash, prefix, suffix, style).

Why FastText

FastText uses subword information. This allows: • better handling of rare words • handling of new invented words • robust theme mixing

Improving Results

More text → better vectors.

Best results when: • corpus matches your domain • theme words appear in corpus • consistent language

If a theme word is unknown: add it to your corpus and retrain.

Privacy / Sharing

To share the tool safely:

Share: • aliasgen.py • train_vectors.py • generic aliasgen.yaml

Do NOT share: • private corpus text • vectors.kv derived from private data

Vectors can encode semantic information from source text.

About

generates aliases

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages