Skip to content

Briiqn/username-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

username-classifier

distilbert model trained on a subset of ~100k usernames (from 1.5m real + 1.5m synthetic from minecraft cheat clients) to detect and block suspicious usernames using binary classification

overview

this can flag bot accounts and cheat client usernames when players join 🤷‍♂️. The model analyzes username patterns and returns real/generated prediction with confidence score.

  • 0 = real username (legitimate player)
  • 1 = generated username (likely cheat client/bot)

installation

pip install -r requirements.txt

usage

start the api server:

python api.py

classify a single username:

curl -X POST http://localhost:8000/classify \
  -H "Content-Type: application/json" \
  -d '{"username": "xX_bot_Xx"}'

response:

{
  "username": "xX_bot_Xx",
  "prediction": 1,
  "confidence": 0.94,
  "label": "Generated",
  "processing_time": 0.02
}

batch classification (up to 1000 usernames):

curl -X POST http://localhost:8000/classify/batch \
  -H "Content-Type: application/json" \
  -d '{"usernames": ["player1", "xXx_bot_xXx", "normal_player"]}'

api endpoints

  • GET /health - check api status
  • POST /classify - classify single username
  • POST /classify/batch - classify multiple usernames
  • GET /stats - cache statistics
  • POST /clear-cache - clear prediction cache

training

python main.py

trained on ~100k usernames split 80/13/7 for train/val/test. model learns patterns from raw username text, no manual feature engineering needed. uses distilbert-base-uncased for fast inference (~20-50ms per username).

dataset format - need two csvs with username column:

  • real.csv - legitimate usernames
  • synthetic.csv - generated usernames from cheat clients

requirements

  • python 3.8+
  • 8gb+ ram (16gb recommended)
  • gpu optional but faster for training/inference

credits

Liquidbounce (copy and pasted their name generation algorithm for the synthetic data 🤣) - https://github.com/CCBlueX/LiquidBounce/blob/b5d982ad3968f70fe969fb61158fd704e15c4411/src/main/kotlin/net/ccbluex/liquidbounce/utils/client/NameGenerator.kt

About

A DistilBERT based classification model trained on a subset of ~100k legitimate usernames and synthetically generated ones from minecraft clients.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages