Goal: train high-accuracy detectors for optical music recognition (dense, tiny symbols).
Planned model families:
- Deformable DETR
- RT-DETRv2 / RT-DETRv3
- Cascade R-CNN + FPN
Next steps:
- Add dataset pointers and tiling/preprocessing scripts for high-res scores.
- Create training configs for each model family.
- Run smoke tests, then full training/eval; track metrics and checkpoints here.