Generic orchestration framework for running autonomous Claude Code research agents over extended periods (days, weeks, months).
Domain-agnostic: swap CLAUDE.md and BENCHMARK.md, and the same infrastructure
works for any deep research problem.
- Token limit resilience: Sessions end with clean checkpoints. Next session resumes exactly where the last left off.
- State persistence: All research state lives in Git as structured markdown.
- Human-in-the-loop without blocking: Agent runs autonomously. Flags when stuck. You review when you choose to.
cron (every 6h) → deep-research run → claude --print → state files → git commit+push
The orchestrator:
- Pulls latest state from Git
- Reads STATUS.md (current objective + phase)
- Checks HUMAN_REVIEW.md (pivot instructions)
- Builds prompt with full context
- Runs one Claude Code session (90 min max)
- Commits and pushes all state changes
pip install -e ".[dev]"deep-research scaffold my-research-projectThis creates a complete project at ~/Github repos/my-research-project/ with
all state files, directory structure, and an initial git commit.
Edit two files in the new project:
CLAUDE.md-- Research instructions (what to investigate)BENCHMARK.md-- Evaluation criteria (how to measure success)
cd ~/Github\ repos/my-research-project
gh repo create EduardPetraeus/my-research-project --public --source .
git push -u origin maindeep-research run --repo ~/Github\ repos/my-research-projectdeep-research status --repo ~/Github\ repos/my-research-projectcrontab -eAdd (4 sessions per day):
0 0,6,12,18 * * * cd ~/Github\ repos/my-research-project && deep-research run >> logs/cron.log 2>&1
Return in 30 days. Check:
ORCHESTRATION_LOG.md-- session success ratesSTATUS.md-- current phase and benchmark scoreHUMAN_REVIEW.md-- any stuck flags
deep-research scaffold <name> [--path PATH] [--model MODEL]
deep-research run [--repo PATH] [--model MODEL] [--max-turns N]
deep-research status [--repo PATH]
deep-research --version
| File | Purpose |
|---|---|
STATUS.md |
Session checkpoint. Phase, objective, resume flag |
RESEARCH_STATE.md |
Accumulated validated findings (append-only) |
EXPLORATION_LOG.md |
Per-session research log (append-only) |
HUMAN_REVIEW.md |
Async communication channel |
BENCHMARK.md |
Evaluation criteria (locked after init) |
ORCHESTRATION_LOG.md |
Automated session tracking table |
Override via environment variables or CLI flags:
| Variable | Default | Description |
|---|---|---|
DEEP_RESEARCH_REPO |
~/Github repos/deep-research-project |
Project path |
DEEP_RESEARCH_MODEL |
claude-sonnet-4-5-20250929 |
Claude model |
DEEP_RESEARCH_MAX_MINUTES |
90 |
Max session duration |
DEEP_RESEARCH_STUCK_THRESHOLD |
3 |
Sessions before stuck flag |
DEEP_RESEARCH_MAX_TURNS |
60 |
Max Claude turns per session |
Write directives in HUMAN_REVIEW.md under ## Active Instructions:
PIVOT: Switch to investigating X instead of Y
INSTRUCTION: Run benchmark evaluation on current methods
APPROVED: Delete data/raw/old_dataset/
CONTINUE:
/opt/homebrew/bin/python3.12 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/ -v
ruff check . && ruff format --check .- macOS (designed for Mac Mini M4)
- Claude Code CLI (
claudein PATH) - Claude Code Max subscription (x20 recommended for autonomous sessions)
- Python 3.9+
- Git
MIT