Allow iterative refinement of `program.md` based on experiment results alongside `train.py`

I would like to propose an additional research direction for autoresearch.

Right now the agent mainly improves the training loop by modifying `train.py` and validating the result through experiments. This is useful, but it keeps the optimization focused on the training code itself. A possible next step would be to also let the system iteratively refine its own research guidance in `program.md`.

The idea is simple:

- the agent runs experiments and collects outcomes;
- then, based on successful and failed changes, it updates the prompt and/or program that guides the next phase of research;
- the next iterations are therefore influenced not only by code changes in `train.py`, but also by improved search strategy, priorities, and heuristics in `program.md`.

Cause is effect here is direct:

- if the agent only edits `train.py`, it optimizes the object of research;
- if it can also refine `program.md`, it begins to optimize the method of research itself;
- this can reduce random local changes, improve search discipline, and make later iterations more informed.

In other words, the system could preserve the current `train.py` mutation loop, while adding a second loop where the agent improves the prompt – `program.md` – that drives future mutations. A constrained version may be especially practical: not rewriting the whole program freely, but updating only specific strategy sections such as hypothesis priorities, failed-pattern memory, search order, or mutation heuristics. One important detail here is that the current constraints are explicit for `train.py`, `prepare.py`, dependencies, and the evaluation harness, but there is no equally explicit prohibition against modifying `program.md`. In that sense, `program.md` is paradoxically not just the instruction layer of the system, but also a potential research object itself. If `train.py` is the object-level mutation target, then `program.md` may be viewed as the nlp-level mutation target, because it shapes how future mutations are generated and prioritized. This suggests a possible second optimization loop: not only improving the training code, but also iteratively refining the research program that guides the next experimental phase. 

This looks interesting because it turns the process from "edit training code and test" into "edit training code, learn from outcomes, and improve the research strategy for the next round".  I think this could make the frontier AI research stronger not only as an experiment runner, but as a system that incrementally improves its own research behaviour.

Would be interesting to hear whether this direction has already been considered.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow iterative refinement of `program.md` based on experiment results alongside `train.py` #314

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Allow iterative refinement of program.md based on experiment results alongside train.py #314

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Allow iterative refinement of `program.md` based on experiment results alongside `train.py` #314