I would like to propose an additional research direction for autoresearch.
Right now the agent mainly improves the training loop by modifying train.py and validating the result through experiments. This is useful, but it keeps the optimization focused on the training code itself. A possible next step would be to also let the system iteratively refine its own research guidance in program.md.
The idea is simple:
- the agent runs experiments and collects outcomes;
- then, based on successful and failed changes, it updates the prompt and/or program that guides the next phase of research;
- the next iterations are therefore influenced not only by code changes in
train.py, but also by improved search strategy, priorities, and heuristics in program.md.
Cause is effect here is direct:
- if the agent only edits
train.py, it optimizes the object of research;
- if it can also refine
program.md, it begins to optimize the method of research itself;
- this can reduce random local changes, improve search discipline, and make later iterations more informed.
In other words, the system could preserve the current train.py mutation loop, while adding a second loop where the agent improves the prompt – program.md – that drives future mutations. A constrained version may be especially practical: not rewriting the whole program freely, but updating only specific strategy sections such as hypothesis priorities, failed-pattern memory, search order, or mutation heuristics. One important detail here is that the current constraints are explicit for train.py, prepare.py, dependencies, and the evaluation harness, but there is no equally explicit prohibition against modifying program.md. In that sense, program.md is paradoxically not just the instruction layer of the system, but also a potential research object itself. If train.py is the object-level mutation target, then program.md may be viewed as the nlp-level mutation target, because it shapes how future mutations are generated and prioritized. This suggests a possible second optimization loop: not only improving the training code, but also iteratively refining the research program that guides the next experimental phase.
This looks interesting because it turns the process from "edit training code and test" into "edit training code, learn from outcomes, and improve the research strategy for the next round". I think this could make the frontier AI research stronger not only as an experiment runner, but as a system that incrementally improves its own research behaviour.
Would be interesting to hear whether this direction has already been considered.
I would like to propose an additional research direction for autoresearch.
Right now the agent mainly improves the training loop by modifying
train.pyand validating the result through experiments. This is useful, but it keeps the optimization focused on the training code itself. A possible next step would be to also let the system iteratively refine its own research guidance inprogram.md.The idea is simple:
train.py, but also by improved search strategy, priorities, and heuristics inprogram.md.Cause is effect here is direct:
train.py, it optimizes the object of research;program.md, it begins to optimize the method of research itself;In other words, the system could preserve the current
train.pymutation loop, while adding a second loop where the agent improves the prompt –program.md– that drives future mutations. A constrained version may be especially practical: not rewriting the whole program freely, but updating only specific strategy sections such as hypothesis priorities, failed-pattern memory, search order, or mutation heuristics. One important detail here is that the current constraints are explicit fortrain.py,prepare.py, dependencies, and the evaluation harness, but there is no equally explicit prohibition against modifyingprogram.md. In that sense,program.mdis paradoxically not just the instruction layer of the system, but also a potential research object itself. Iftrain.pyis the object-level mutation target, thenprogram.mdmay be viewed as the nlp-level mutation target, because it shapes how future mutations are generated and prioritized. This suggests a possible second optimization loop: not only improving the training code, but also iteratively refining the research program that guides the next experimental phase.This looks interesting because it turns the process from "edit training code and test" into "edit training code, learn from outcomes, and improve the research strategy for the next round". I think this could make the frontier AI research stronger not only as an experiment runner, but as a system that incrementally improves its own research behaviour.
Would be interesting to hear whether this direction has already been considered.