MLWM v1 Branch

codex/mlwm-v1 is the active neural image watermarking research and integration branch.

Purpose

This branch adds a learning-assisted robust image watermark engine next to the existing legacy DCT/Reed-Solomon image watermark engine.

The branch should stay separate from master until a first trained model can be exported and benchmarked.

Implemented:

engine='auto' | 'legacy' | 'neural' dispatch in the Python watermark engine
Structured Python return values with:
- engine_used
- fallback_used
- confidence
- diagnostics
Electron IPC and preload type support for image watermark engine selection
Renderer UI for Auto, Legacy, and Neural
MLWM modules for:
- payload codec
- attack simulation
- dataset loading
- model definitions
- training
- ONNX export
- runtime inference
- benchmarking
- traceability
GitHub Actions checks:
- Test robust watermark engine
- MLWM unit tests
- Typecheck
Dataset preparation tooling
Unsplash Lite metadata downloader

Not completed:

Validated locally:

Python 3.12 virtual environments:
- .venv-ml
- .venv-pack
PyTorch CUDA:
- torch 2.11.0+cu128
- GPU detected: NVIDIA GeForce RTX 5060 Laptop GPU
ONNX Runtime:
- onnxruntime 1.25.0
Smoke training completed on prepared Unsplash Lite data.
Temporary single-file ONNX export completed.
Helper check can report neuralReady=true when pointed at a temporary exported model directory.

The current local training data was prepared from Unsplash Lite metadata.

Downloaded:

Prepared dataset:

The data/ directory is intentionally ignored by Git.

Latest real-data smoke run:

This confirms the training pipeline is operational. It is not a usable production model.

Run the first full training pass when the local GPU can be occupied for several hours:

.\.venv-ml\Scripts\python.exe -m blind_watermark.mlwm.train --config configs\mlwm\main.yaml

After training:

Export the best checkpoint to a temporary candidate directory.
Run benchmark evaluation.
Promote the model only if it meets the release thresholds.
Update resources/models/neural_wm/model.json with hashes, commit, dataset manifest hash, and benchmark summary.
Keep PR #1 as draft until the model promotion decision is clear.