write eval results to results.json for structured agent consumption#79
Closed
mvanhorn wants to merge 1 commit intokarpathy:masterfrom
Closed
write eval results to results.json for structured agent consumption#79mvanhorn wants to merge 1 commit intokarpathy:masterfrom
mvanhorn wants to merge 1 commit intokarpathy:masterfrom
Conversation
The agent loop currently reads results by grepping stdout from run.log, which mixes trusted metrics with arbitrary training output. Writing a structured results.json gives agents a reliable, parseable results channel. Existing stdout output is unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4 tasks
Author
|
Closing in favor of #331 which rebases this cleanly on current main and adds crash diagnostic improvements. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #64
Adds a
results.jsonfile written after evaluation with the same metrics already printed to stdout. This gives agents a structured, parseable results channel instead of relying on grepping free-form stdout fromrun.log.train.py: writesresults.jsonafter the final summary (5 lines, uses stdlibjson)program.md: instructs agent to readresults.jsonfirst, fall back to grepExisting stdout output is unchanged.
This contribution was developed with AI assistance (Claude Code).