60% more efficient autoresearch via better training analysis by ottogin · Pull Request #353 · karpathy/autoresearch

ottogin · 2026-03-20T12:21:42Z

Hi! While experimenting with autoresearch, I noticed that the agent has very limited observability into the training process and rarely looks beyond the final validation loss.

I updated train.py to log more training statistics and added an analysis step where the agent uses Python to inspect training dynamics. This consistently improves BPB.

I ran this comparison multiple times—there’s some noise, but extended logging + analysis consistently leads to lower BPB. Experiments were run on H100 with Claude Opus 4.6 via Claude Code.

I think this could be helpful for others working with autoresearch, so in this PR I’m adding a link to my code as a notable fork. I’m also happy to submit a PR with all the changes to the main repo if you think that makes sense.

Details: https://github.com/ottogin/auto-log-research

Link to a notable fork

reference 60% more efficient autoresearch via better training analysis

5d4481e

Link to a notable fork

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

60% more efficient autoresearch via better training analysis#353

60% more efficient autoresearch via better training analysis#353
ottogin wants to merge 1 commit intokarpathy:masterfrom
ottogin:patch-2

ottogin commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ottogin commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant