Vision Autoresearch

Multi-agent experiment loop for vision model finetuning. Adapted from multiautoresearch - same disciplined single-change methodology, but for vision tasks with YAML configs as the experiment surface instead of editing training scripts directly.

Agents propose a hypothesis, change one config knob, run a finetune, and auto-promote when the metric beats the current master. Works locally on consumer GPUs or on HF Jobs.

Tasks

Task	Script	Default Model	Dataset	Metric
Classify	`train_classify.py`	`google/vit-base-patch16-224`	food101	accuracy
Detect	`train_detect.py`	`ustc-community/dfine-small-coco`	cppe-5	mAP
Segment	`train_segment.py`	`facebook/sam2.1-hiera-small`	—	IoU

All metrics are higher-is-better.

Setup

uv sync

Run locally (single GPU)

uv run scripts/refresh_master.py
# edit configs/base_classify.yaml — one knob change
uv run prepare.py --dataset food101 --task classify --split train
CUDA_VISIBLE_DEVICES=0 uv run scripts/run_local.py --task classify --config configs/base_classify.yaml
uv run scripts/submit_patch.py --comment "classify: lr 1e-4"

Run on HF Jobs

uv run scripts/refresh_master.py
uv run scripts/hf_job.py launch --task classify --config configs/base_classify.yaml
uv run scripts/hf_job.py logs <JOB_ID> --follow --output /tmp/vision-run.log
uv run scripts/parse_metric.py /tmp/vision-run.log
uv run scripts/submit_patch.py --comment "classify: lr 1e-4"

Agent-driven (opencode)

Send a single prompt — the agent handles refresh, config edit, run, parse, and submit:

CUDA_VISIBLE_DEVICES=0 opencode run "
Finetune google/vit-base-patch16-224 on food101 using the classify task.
Read AGENTS.md for repo conventions. Follow the standard workflow end-to-end.
"

For parallel experiments on multi-GPU:

CUDA_VISIBLE_DEVICES=0 uv run scripts/opencode_worker.py run exp-01 &
CUDA_VISIBLE_DEVICES=1 uv run scripts/opencode_worker.py run exp-02 &

Each worker runs in an isolated git worktree under .runtime/worktrees/.

How it works

Config YAML is the experiment surface. Edit configs/*.yaml, never training scripts. Configs are parsed natively via HfArgumentParser.parse_yaml_file() — keys map 1:1 to TrainingArguments / dataclass fields.
One hypothesis per run. Change exactly one config knob per experiment.
Refresh before each experiment. refresh_master.py restores configs to the current promoted master.
Auto-promotion. submit_patch.py appends to research/results.tsv and promotes the config if the metric beats master.
research/live/master.json is the source of truth.

Config knobs

learning_rate, weight_decay, warmup_steps, lr_scheduler_type
per_device_train_batch_size, gradient_accumulation_steps
num_train_epochs, fp16
freeze_backbone, image_square_size (detect)
use_albumentations (detect), prompt_type (segment)

Repo layout

.
├── configs/
│   ├── base_classify.yaml
│   ├── base_detect.yaml
│   └── base_segment.yaml
├── train_classify.py          # stable — do not edit
├── train_detect.py            # stable — do not edit
├── train_segment.py           # stable — do not edit
├── prepare.py                 # dataset validation CLI (vision_lab.dataset_validation)
├── scripts/
│   ├── refresh_master.py      # restore config from promoted master
│   ├── run_local.py           # local GPU execution
│   ├── hf_job.py              # HF Jobs launcher
│   ├── parse_metric.py        # extract metrics from logs
│   ├── submit_patch.py        # record run + auto-promote
│   ├── opencode_worker.py     # agent worktree isolation
│   ├── worker_common.py       # shared worker utilities
│   ├── trackio_reporter.py    # experiment monitoring
│   ├── dataset_inspector.py   # dataset format validation
│   ├── estimate_cost.py       # cost estimation
│   └── local_results.py       # results ledger management
├── research/
│   ├── results.tsv            # append-only run ledger
│   ├── notes.md               # experiment notebook
│   ├── do-not-repeat.md       # failed experiment guidance
│   ├── paper-ideas.md         # literature-derived hypotheses
│   ├── live/                  # promoted master + DAG
│   ├── reference/             # seed master snapshots
│   ├── campaigns/             # active campaign docs
│   ├── experiments/           # per-experiment docs
│   └── templates/             # campaign/experiment templates
├── AGENTS.md                  # agent roles + rules
├── program.md                 # benchmark entrypoint
└── pyproject.toml

Acknowledgments

Based on multiautoresearch by @burtenshaw.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.opencode/agent		.opencode/agent
configs		configs
research		research
scripts		scripts
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
README.md		README.md
main.py		main.py
opencode.json		opencode.json
prepare.py		prepare.py
program.md		program.md
pyproject.toml		pyproject.toml
train_classify.py		train_classify.py
train_detect.py		train_detect.py
train_segment.py		train_segment.py
train_ultralytics.py		train_ultralytics.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Autoresearch

Tasks

Setup

Run locally (single GPU)

Run on HF Jobs

Agent-driven (opencode)

How it works

Config knobs

Repo layout

Acknowledgments

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vision Autoresearch

Tasks

Setup

Run locally (single GPU)

Run on HF Jobs

Agent-driven (opencode)

How it works

Config knobs

Repo layout

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages