Give an AI agent a codebase, a metric, and a time budget — let it run experiments autonomously overnight. You wake up to a log of what it tried, what worked, and what didn't.
Inspired by Karpathy's autoresearch, but general-purpose. Works for performance optimization, prompt engineering, config tuning — anything with a measurable score.
~1000 lines of Python. Zero external dependencies.
pip install overnight-experimenterRequires Python 3.10+ and a coding agent CLI (Claude Code or Codex).
# Create an experiment
overnight init my-experiment
# Edit the files:
# program.md — what to optimize + constraints
# evaluate.sh — script that outputs a numeric score
# workspace/ — the files the agent can modify
# Run it
overnight run my-experiment --budget 8h --agent claude --direction minimize
# Check progress
overnight status my-experiment
# Generate an HTML report
overnight report my-experimentexperiment/
├── program.md # Instructions for the agent
├── workspace/ # Files the agent modifies
├── evaluate.sh # Returns a numeric score
├── experiments.jsonl # Auto-generated log
└── best/ # Snapshot of best version
The loop:
- Agent reads
program.md+ experiment history - Agent makes ONE modification to
workspace/ evaluate.shruns → numeric score- If improved → snapshot to
best/, log success - If not → revert to
best/, log failure - Repeat until time budget is exhausted
The included example optimizes a naive prime-finding algorithm:
# Copy the example
cp -r examples/python-perf my-experiment
# Run for 30 minutes
overnight run my-experiment --budget 30m --agent claude --direction minimizeStarts with trial division (~0.11s), agent discovers sieve of Eratosthenes, bitwise tricks, etc.
| Command | Description |
|---|---|
overnight init <name> |
Create experiment from templates |
overnight run <name> [options] |
Run the experiment loop |
overnight status <name> |
Show progress summary |
overnight report <name> |
Generate HTML report |
| Flag | Default | Description |
|---|---|---|
--budget |
8h |
Time budget (e.g. 8h, 30m, 2h30m) |
--agent |
claude |
Coding agent (claude or codex) |
--direction |
maximize |
Optimization direction (minimize or maximize) |
--max-experiments |
— | Optional cap on number of experiments |
MIT