Overnight Experimenter

Give an AI agent a codebase, a metric, and a time budget — let it run experiments autonomously overnight. You wake up to a log of what it tried, what worked, and what didn't.

Inspired by Karpathy's autoresearch, but general-purpose. Works for performance optimization, prompt engineering, config tuning — anything with a measurable score.

~1000 lines of Python. Zero external dependencies.

Install

pip install overnight-experimenter

Requires Python 3.10+ and a coding agent CLI (Claude Code or Codex).

Quick Start

# Create an experiment
overnight init my-experiment

# Edit the files:
#   program.md   — what to optimize + constraints
#   evaluate.sh  — script that outputs a numeric score
#   workspace/   — the files the agent can modify

# Run it
overnight run my-experiment --budget 8h --agent claude --direction minimize

# Check progress
overnight status my-experiment

# Generate an HTML report
overnight report my-experiment

How It Works

experiment/
├── program.md          # Instructions for the agent
├── workspace/          # Files the agent modifies
├── evaluate.sh         # Returns a numeric score
├── experiments.jsonl   # Auto-generated log
└── best/               # Snapshot of best version

The loop:

Agent reads program.md + experiment history
Agent makes ONE modification to workspace/
evaluate.sh runs → numeric score
If improved → snapshot to best/, log success
If not → revert to best/, log failure
Repeat until time budget is exhausted

Example: Python Performance

The included example optimizes a naive prime-finding algorithm:

# Copy the example
cp -r examples/python-perf my-experiment

# Run for 30 minutes
overnight run my-experiment --budget 30m --agent claude --direction minimize

Starts with trial division (~0.11s), agent discovers sieve of Eratosthenes, bitwise tricks, etc.

CLI Reference

Command	Description
`overnight init <name>`	Create experiment from templates
`overnight run <name> [options]`	Run the experiment loop
`overnight status <name>`	Show progress summary
`overnight report <name>`	Generate HTML report

Run Options

Flag	Default	Description
`--budget`	`8h`	Time budget (e.g. `8h`, `30m`, `2h30m`)
`--agent`	`claude`	Coding agent (`claude` or `codex`)
`--direction`	`maximize`	Optimization direction (`minimize` or `maximize`)
`--max-experiments`	—	Optional cap on number of experiments

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
examples/python-perf		examples/python-perf
src/overnight_experimenter		src/overnight_experimenter
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overnight Experimenter

Install

Quick Start

How It Works

Example: Python Performance

CLI Reference

Run Options

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Overnight Experimenter

Install

Quick Start

How It Works

Example: Python Performance

CLI Reference

Run Options

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages