write eval results to results.json for structured agent consumption by mvanhorn · Pull Request #79 · karpathy/autoresearch

mvanhorn · 2026-03-09T14:40:36Z

Fixes #64

Adds a results.json file written after evaluation with the same metrics already printed to stdout. This gives agents a structured, parseable results channel instead of relying on grepping free-form stdout from run.log.

train.py: writes results.json after the final summary (5 lines, uses stdlib json)
program.md: instructs agent to read results.json first, fall back to grep

Existing stdout output is unchanged.

This contribution was developed with AI assistance (Claude Code).

The agent loop currently reads results by grepping stdout from run.log, which mixes trusted metrics with arbitrary training output. Writing a structured results.json gives agents a reliable, parseable results channel. Existing stdout output is unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mvanhorn · 2026-03-19T19:27:27Z

Closing in favor of #331 which rebases this cleanly on current main and adds crash diagnostic improvements.

mvanhorn mentioned this pull request Mar 9, 2026

Indirect prompt injection via training output fed back to agent #64

Open

shichangs mentioned this pull request Mar 18, 2026

write results.json for structured agent consumption, harden crash diagnostics #331

Open

4 tasks

mvanhorn closed this Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write eval results to results.json for structured agent consumption#79

write eval results to results.json for structured agent consumption#79
mvanhorn wants to merge 1 commit intokarpathy:masterfrom
mvanhorn:osc/64-structured-results-json

mvanhorn commented Mar 9, 2026

Uh oh!

mvanhorn commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mvanhorn commented Mar 9, 2026

Uh oh!

mvanhorn commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant