distilbert model trained on a subset of ~100k usernames (from 1.5m real + 1.5m synthetic from minecraft cheat clients) to detect and block suspicious usernames using binary classification
this can flag bot accounts and cheat client usernames when players join 🤷♂️. The model analyzes username patterns and returns real/generated prediction with confidence score.
- 0 = real username (legitimate player)
- 1 = generated username (likely cheat client/bot)
pip install -r requirements.txtstart the api server:
python api.pyclassify a single username:
curl -X POST http://localhost:8000/classify \
-H "Content-Type: application/json" \
-d '{"username": "xX_bot_Xx"}'response:
{
"username": "xX_bot_Xx",
"prediction": 1,
"confidence": 0.94,
"label": "Generated",
"processing_time": 0.02
}batch classification (up to 1000 usernames):
curl -X POST http://localhost:8000/classify/batch \
-H "Content-Type: application/json" \
-d '{"usernames": ["player1", "xXx_bot_xXx", "normal_player"]}'GET /health- check api statusPOST /classify- classify single usernamePOST /classify/batch- classify multiple usernamesGET /stats- cache statisticsPOST /clear-cache- clear prediction cache
python main.pytrained on ~100k usernames split 80/13/7 for train/val/test. model learns patterns from raw username text, no manual feature engineering needed. uses distilbert-base-uncased for fast inference (~20-50ms per username).
dataset format - need two csvs with username column:
real.csv- legitimate usernamessynthetic.csv- generated usernames from cheat clients
- python 3.8+
- 8gb+ ram (16gb recommended)
- gpu optional but faster for training/inference
Liquidbounce (copy and pasted their name generation algorithm for the synthetic data 🤣) - https://github.com/CCBlueX/LiquidBounce/blob/b5d982ad3968f70fe969fb61158fd704e15c4411/src/main/kotlin/net/ccbluex/liquidbounce/utils/client/NameGenerator.kt