Dataset Metadata Injection Tool

A lightweight Python tool for adding Kohya/A1111-compatible tag frequency metadata to LoRA (Low-Rank Adaptation) safetensors files. This tool fixes missing tag metadata in LoRA files trained with AI-Toolkit or similar programs, ensuring proper display in A1111/Forge/ForgeNeo.

🌟 Features

Two Operating Modes — Dataset mode (with caption files) or Manual mode (trigger words only)
Automatic Tag Frequency Calculation — Scans caption files and counts tag occurrences across your dataset
Manual Trigger Word Input — Add metadata without needing the original training dataset
Native Folder Browser — Easy dataset selection with visual folder picker
Flexible Path Support — Use subfolders or any custom path on your system
Kohya/A1111 Compatible — Adds standard metadata fields (ss_tag_frequency, ss_dataset_dirs, etc.)
Non-Destructive Processing — Creates new files with metadata while preserving originals
Live Preview & Validation — Real-time feedback on manual tag input
Smart Dependency Management — Auto-installs required packages in isolated virtual environment
Portable & User-Friendly — Works from any folder with clear error messages

🚀 Quick Start

Prerequisites

Python 3.11+ — Download here
Windows OS — Batch file automation (Linux support planned)
.safetensors LoRA file — Your trained model
Training dataset (optional) — Images with corresponding .txt caption files (only needed for Normal Mode)

Installation

Clone or download this repository:

git clone https://github.com/LindezaBlue/Dataset-Metadata-Injection.git
cd Dataset-Metadata-Injection

No additional setup required! The batch files handle dependency installation automatically.

📖 Usage Guide

Gradio Web Interface

The easiest way to use this tool is through the visual web interface.

Step 1: Prepare Your Files

On first run, the program automatically creates two folders:

Model to Repair/ — Place your LoRA files here
Updated LoRA/ — Updated files will be saved here

For Normal Mode (with dataset):

Place your LoRA file (.safetensors format) in the Model to Repair/ folder
(Optional) Place your dataset subfolder inside Model to Repair/
Example: Model to Repair/my_character/
- Place all your training images (.png, .jpg, .jpeg) in this folder
- Ensure each image has a matching .txt caption file with comma-separated tags
  Example: image1.png → image1.txt containing 1girl, blue hair, smiling, outdoors
OR use a dataset folder anywhere on your system (can browse to it in the UI)

For Manual Mode (without dataset):

Place your LoRA file (.safetensors format) in the Model to Repair/ folder
That's it! You'll enter trigger words directly in the UI

Step 2: Launch the Gradio Interface

Double-click Run Gradio UI.bat
The batch file will automatically:
- Detect your Python installation
- Create a virtual environment (first run only)
- Install required dependencies (gradio, safetensors, torch CPU-only, etc.—first run only)
- Launch the web interface in your default browser at http://127.0.0.1:7860

Step 3: Use the Web Interface

The interface guides you step-by-step with two modes:

Normal Mode (with dataset):

Select your LoRA file
- Choose your LoRA Filename from the dropdown
Select your dataset
- Choose a Dataset Subfolder from the dropdown, OR
- Click 📁 Select Dataset Folder to browse to any folder, OR
- Paste a custom path directly
Scan the dataset
- Click 🔍 Scan Dataset / Parse Tags
- The tool will automatically read all .txt caption files and count tag frequencies
- Results appear in the Tag Frequencies box with status feedback
Inject the metadata
- Click 💾 Inject Metadata & Save
- A new LoRA file with embedded metadata will be created
- Find it in the Updated LoRA/ folder (filename ends with _with_tags.safetensors)

Manual Mode (without dataset):

Select your LoRA file
- Choose your LoRA Filename from the dropdown
Enable manual mode
- Check the "Manual trigger word mode" checkbox
Enter trigger words
- Type your trigger words (comma-separated) in the text box
- Set the tag frequency (default: 1)
- Watch the live preview update in real-time
Inject the metadata
- When the status shows 🟢 Ready, click 💾 Inject Metadata & Save
- Find the updated LoRA in Updated LoRA/ folder
When you're done, just close the browser tab and Terminal/CLI to exit the program.

Interface Preview:

Linux Users

A launch script run_gradio_ui.sh is provided for Linux systems.
It mirrors the Windows batch file, creating a virtual environment, installing dependencies, and launching the Gradio UI.

⚠️ Linux users, please test this script and report any issues, especially with automatic browser opening or dependency installation. (I currently am not able to test if this works on Linux systems so feedback is appreciated, Thank you!)

📂 Project Structure

Dataset-Metadata-Injection/
├── Run Gradio UI.bat           # Launch web interface (Windows)
├── run_gradio_ui.sh            # Launch web interface (Linux)
├── gradio_ui.py                # Gradio interface code
├── Metadata_Injection.py       # Backend code (where the magic happens)
├── requirements.txt            # Python dependencies
├── Model to Repair/            # Input folder (auto-created)
│   ├── your_lora.safetensors   # Your LoRA files
│   └── [dataset_folders]/      # Optional: dataset subfolders
└── Updated LoRA/               # Output folder (auto-created)

🔧 Technical Details

Metadata Fields Added

This tool injects the following Kohya SS / A1111-compatible fields:

Field	Description
`ss_tag_frequency`	JSON object mapping tags to occurrence counts
`ss_dataset_dirs`	Dataset folder names and image counts
`ss_resolution`	Training resolution (default: `1024,1024`)
`ss_num_train_images`	Total number of training images

Dependencies

All dependencies are auto-installed in an isolated virtual environment:

safetensors — Safe model file loading/saving
torch (CPU-only) — Minimal PyTorch for tensor operations
gradio — Web-based UI framework
packaging — Version management utilities

System Requirements

Python: 3.11 or later (auto-detected from %LOCALAPPDATA%\Programs\Python)
Disk Space: ~500MB for virtual environment
Platform: Windows (Linux support in development)

❓ Troubleshooting

"Missing Dataset Folder/Safetensor Model"

Problem: Files not found in expected locations

Solutions:

Verify your LoRA file is in the Model to Repair/ folder
For Normal Mode: verify dataset folder exists (in Model to Repair/ or custom path)
For Manual Mode: no dataset needed, just the LoRA file

"Python not found"

Problem: Python installation not detected

Solutions:

Install Python 3.11+ from python.org
Use "Install for current user" option during installation
Script auto-searches %LOCALAPPDATA%\Programs\Python\

Virtual Environment Issues

Solution: Delete the venv/ folder and re-run the batch file to recreate it cleanly

Caption File Format

Ensure your .txt files use comma-separated tags:

# Correct
1girl, blue hair, smiling, outdoors, school uniform

# Incorrect (will not parse properly)
1girl blue hair smiling outdoors school uniform

🛣️ Roadmap

~~Implement referenced .json config file for easier editing~~
~~Create Gradio web interface for simplified user experience~~
~~Add manual trigger word mode (no dataset required)~~
~~Add native folder browser for easy path selection~~
~~Support for Linux platforms~~
Add batch processing for multiple LoRAs
Tag frequency visualization charts
Export/import metadata presets
Any suggestions from the community

🤝 Contributing

Contributions are welcome! Here's how you can help:

Report bugs via GitHub Issues
Submit pull requests for new features or fixes
Share feedback and suggestions for improvement
Star this repo if you find it useful!

📄 License

This project is licensed under the
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
See the LICENSE file for full details.

You are free to use, modify, and share this project, but only for non-commercial purposes, and you must give proper credit to the original author.

If you remix or build upon this work, you must distribute your contributions under the same license!

💖 Support & Credits

Created by: LindezaBlue

Special thanks to the AI art community for feedback and testing!

If you find this tool helpful:

⭐ Star this repository
🐛 Report issues to help improve the tool
📢 Share with others who might benefit

📝 Notes

This tool modifies metadata only, not model weights
Designed for AI-Toolkit trained LoRAs missing standard metadata
Compatible with Automatic1111, Forge, and ForgeNeo interfaces
Non-destructive processing ensures your originals remain intact

Questions? Open an issue or check existing discussions!

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Model to Repair		Model to Repair
Updated LoRA		Updated LoRA
Metadata_Injection.py		Metadata_Injection.py
README.md		README.md
Run Gradio UI.bat		Run Gradio UI.bat
gradio_ui.py		gradio_ui.py
license.md		license.md
requirements.txt		requirements.txt
run_gradio_ui.sh		run_gradio_ui.sh

Folders and files

Latest commit

History

Repository files navigation

Dataset Metadata Injection Tool

🌟 Features

🚀 Quick Start

Prerequisites

Installation

📖 Usage Guide

Gradio Web Interface

Step 1: Prepare Your Files

For Normal Mode (with dataset):

For Manual Mode (without dataset):

Step 2: Launch the Gradio Interface

Step 3: Use the Web Interface

Normal Mode (with dataset):

Manual Mode (without dataset):

Interface Preview:

Linux Users

📂 Project Structure

🔧 Technical Details

Metadata Fields Added

Dependencies

System Requirements

❓ Troubleshooting

"Missing Dataset Folder/Safetensor Model"

"Python not found"

Virtual Environment Issues

Caption File Format

🛣️ Roadmap

🤝 Contributing

📄 License

If you remix or build upon this work, you must distribute your contributions under the same license!

💖 Support & Credits

📝 Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages