A modern, React-based web application for training FLUX LoRA models with integrated captioning, dataset management, and HuggingFace publishing.
Features • Quick Start • Installation • Usage • API • Contributing
- 🖼️ Advanced Image Captioning - Support for Florence-2 and Qwen2.5-VL models
- 📦 Automated Dataset Creation - Smart image processing with latent caching
- 🚀 FLUX LoRA Training - Full integration with kohya-ss/sd-scripts
- 📊 Real-time Monitoring - Live training metrics and progress tracking
- ☁️ HuggingFace Publishing - Direct upload to HuggingFace model hub
- 💾 Model Management - Download from CivitAI and HuggingFace with preview support
- Responsive Design - Works on desktop and mobile devices
- Dark Mode - Easy on the eyes during long training sessions
- Real-time Updates - WebSocket-based live training logs
- Drag & Drop - Intuitive file uploads
- Progress Tracking - Visual progress bars and metrics charts
graph TB
subgraph "Frontend - React"
UI[React UI]
Store[Zustand State]
API_Client[API Client]
end
subgraph "Backend - FastAPI"
Server[FastAPI Server]
Vision[Vision Models]
Training[Training Manager]
Downloads[Download Manager]
end
subgraph "External Services"
HF[HuggingFace]
Civit[CivitAI]
SD[sd-scripts]
end
UI --> Store
Store --> API_Client
API_Client -->|HTTP/WS| Server
Server --> Vision
Server --> Training
Server --> Downloads
Training -->|subprocess| SD
Downloads -->|API| HF
Downloads -->|API| Civit
style UI fill:#61dafb
style Server fill:#009688
style SD fill:#ff6b6b
# Clone the repository
git clone git@github.com:ComfyAssets/kiko-trainer.git
cd kiko-trainer
# Start with Docker Compose
docker compose up -d --build
# Access the application
# Frontend: http://localhost:8080
# API: http://localhost:8001# Clone and setup
git clone git@github.com:ComfyAssets/kiko-trainer.git
cd kiko-trainer
# Setup backend
./setup.sh # This clones sd-scripts and installs dependencies
# Start backend
source venv/bin/activate
uvicorn backend.server:app --host 0.0.0.0 --port 8001 --reload
# In a new terminal - Setup frontend
cd web
npm install
npm run dev
# Access at http://localhost:5173- Python 3.8+
- Node.js 16+
- CUDA-capable GPU (for training)
- 16GB+ VRAM recommended
- Git
- Clone the repository
git clone git@github.com:ComfyAssets/kiko-trainer.git
cd kiko-trainer- Run the setup script
./setup.shThis will:
- Clone kohya-ss/sd-scripts (required dependency)
- Create a Python virtual environment
- Install all Python dependencies
- GPU Setup (if needed)
# For CUDA 12.1
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121cd web
npm installgraph LR
A[Upload Images] --> B[Caption Images]
B --> C[Create Dataset]
C --> D[Configure Training]
D --> E[Start Training]
E --> F[Monitor Progress]
F --> G[Publish to HF]
style A fill:#4ade80
style G fill:#60a5fa
- Navigate to the Models tab
- Enter your CivitAI API key (get from CivitAI Account)
- Download FLUX models and required components
- Go to Setup tab
- Upload your images via drag & drop
- Choose captioning model:
- Florence-2: Fast, good quality
- Qwen2.5-VL: Slower, best quality
- Configure caption settings and generate
- Set output folder name
- Choose resolution (512, 768, 1024)
- Enable latent caching for faster training
- Create dataset
- Navigate to Training tab
- Select base model and dataset
- Configure hyperparameters:
- Learning rate: 1e-4 (recommended)
- Steps: 500-2000
- Batch size: Based on VRAM
- Start training
- Go to Publish tab
- Select trained LoRA
- Enter HuggingFace credentials
- Configure repository settings
- Publish to HuggingFace
The application supports multiple vision-language models:
| Model | Size | Quality | Speed | VRAM |
|---|---|---|---|---|
| Florence-2 | 0.7B | Good | Fast | 4GB |
| Qwen2.5-VL-3B | 3B | Better | Medium | 8GB |
| Qwen2.5-VL-7B | 7B | Best | Slow | 16GB |
# Example configuration
learning_rate: 1e-4
network_dim: 32
network_alpha: 16
batch_size: 1
max_train_steps: 1000
save_every_n_steps: 100Real-time training metrics include:
- Loss curves
- Learning rate schedule
- Memory usage
- Training speed (it/s)
- Sample generation preview
| Endpoint | Method | Description |
|---|---|---|
/api/health |
GET | Health check |
/api/models |
GET | List available models |
/api/caption |
POST | Caption images |
/api/create_dataset |
POST | Create training dataset |
/api/train/prepare |
POST | Prepare training scripts |
/api/train/start |
POST | Start training |
/api/train/logs |
GET | Get training logs |
/api/loras |
GET | List trained LoRAs |
/api/publish |
POST | Publish to HuggingFace |
/api/train/ws- Real-time training logs/api/metrics/stream- Live metrics updates
kiko-trainer/
├── backend/ # FastAPI backend
│ ├── server.py # Main API server
│ ├── train_utils.py # Training utilities
│ └── civitai_downloader.py # Model downloads
├── web/ # React frontend
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── pages/ # Application pages
│ │ └── store/ # State management
│ └── package.json
├── sd-scripts/ # Training engine (cloned)
├── models/ # Downloaded models
├── datasets/ # Training datasets
├── outputs/ # Trained LoRAs
├── docker-compose.yml # Docker configuration
└── setup.sh # Setup script
version: "3.9"
services:
api:
build: .
ports:
- "8001:8001"
volumes:
- ./models:/app/models
- ./outputs:/app/outputs
environment:
- HF_HUB_ENABLE_HF_TRANSFER=1
web:
build: ./web
ports:
- "8080:80"
depends_on:
- api# Build and start
docker compose up -d --build
# View logs
docker compose logs -f
# Stop services
docker compose down# Backend tests
source venv/bin/activate
pytest
# Frontend tests
cd web
npm test
npm run type-check- Backend: Black, Flake8, MyPy
- Frontend: ESLint, Prettier, TypeScript
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- kohya-ss/sd-scripts - Training engine
- FLUX - Base model architecture
- Florence-2 - Vision model
- Qwen2.5-VL - Advanced vision model
CUDA not detected
# Install CUDA-compatible PyTorch
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121Out of Memory
- Reduce batch size to 1
- Enable gradient checkpointing
- Use 8-bit optimization
- Consider using a smaller vision model
Port already in use
# Change backend port
uvicorn backend.server:app --port 8002
# Change frontend port
cd web && VITE_PORT=3001 npm run dev- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Discord: Join our Discord
- Multi-GPU training support
- SDXL LoRA training
- Automatic dataset augmentation
- Training queue management
- Mobile app for monitoring
- Integration with ComfyUI/A1111