-
Notifications
You must be signed in to change notification settings - Fork 0
MLWM v1 Branch
CEQ151 edited this page Apr 26, 2026
·
1 revision
codex/mlwm-v1 is the active neural image watermarking research and integration branch.
This branch adds a learning-assisted robust image watermark engine next to the existing legacy DCT/Reed-Solomon image watermark engine.
The branch should stay separate from master until a first trained model can be exported and benchmarked.
Implemented:
-
engine='auto' | 'legacy' | 'neural'dispatch in the Python watermark engine - Structured Python return values with:
engine_usedfallback_usedconfidencediagnostics
- Electron IPC and preload type support for image watermark engine selection
- Renderer UI for
Auto,Legacy, andNeural - MLWM modules for:
- payload codec
- attack simulation
- dataset loading
- model definitions
- training
- ONNX export
- runtime inference
- benchmarking
- traceability
- GitHub Actions checks:
Test robust watermark engineMLWM unit testsTypecheck
- Dataset preparation tooling
- Unsplash Lite metadata downloader
Not completed:
- Full main training run
- Promoted
encoder.onnxanddecoder.onnx - Benchmark report against the release thresholds
-
resources/models/neural_wm/model.jsonpromotion frompending-training
Validated locally:
- Python 3.12 virtual environments:
.venv-ml.venv-pack
- PyTorch CUDA:
torch 2.11.0+cu128- GPU detected:
NVIDIA GeForce RTX 5060 Laptop GPU
- ONNX Runtime:
onnxruntime 1.25.0
- Smoke training completed on prepared Unsplash Lite data.
- Temporary single-file ONNX export completed.
- Helper check can report
neuralReady=truewhen pointed at a temporary exported model directory.
The current local training data was prepared from Unsplash Lite metadata.
Downloaded:
- Requested:
5000 - Successful:
4999 - Failed:
1
Prepared dataset:
-
data/train_images:4488 -
data/val_images:499 - Manifest:
data/dataset_manifest.json
The data/ directory is intentionally ignored by Git.
Latest real-data smoke run:
- Run directory:
artifacts/mlwm_v1/runs/20260426T074520+0000_6056931 - Best checkpoint:
best.ckpt - Best epoch:
2 - Best score:
0.680125 - Validation payload accuracy: about
0.680 - Exact match:
0.0
This confirms the training pipeline is operational. It is not a usable production model.
Run the first full training pass when the local GPU can be occupied for several hours:
.\.venv-ml\Scripts\python.exe -m blind_watermark.mlwm.train --config configs\mlwm\main.yamlAfter training:
- Export the best checkpoint to a temporary candidate directory.
- Run benchmark evaluation.
- Promote the model only if it meets the release thresholds.
- Update
resources/models/neural_wm/model.jsonwith hashes, commit, dataset manifest hash, and benchmark summary. - Keep PR #1 as draft until the model promotion decision is clear.