Skip to content

eduardpetraeus-lab/deep-research-framework

Repository files navigation

Deep Research Framework

Generic orchestration framework for running autonomous Claude Code research agents over extended periods (days, weeks, months).

Domain-agnostic: swap CLAUDE.md and BENCHMARK.md, and the same infrastructure works for any deep research problem.

What It Solves

  1. Token limit resilience: Sessions end with clean checkpoints. Next session resumes exactly where the last left off.
  2. State persistence: All research state lives in Git as structured markdown.
  3. Human-in-the-loop without blocking: Agent runs autonomously. Flags when stuck. You review when you choose to.

Architecture

cron (every 6h) → deep-research run → claude --print → state files → git commit+push

The orchestrator:

  1. Pulls latest state from Git
  2. Reads STATUS.md (current objective + phase)
  3. Checks HUMAN_REVIEW.md (pivot instructions)
  4. Builds prompt with full context
  5. Runs one Claude Code session (90 min max)
  6. Commits and pushes all state changes

Quick Start

1. Install

pip install -e ".[dev]"

2. Create a new research project

deep-research scaffold my-research-project

This creates a complete project at ~/Github repos/my-research-project/ with all state files, directory structure, and an initial git commit.

3. Configure the project

Edit two files in the new project:

  • CLAUDE.md -- Research instructions (what to investigate)
  • BENCHMARK.md -- Evaluation criteria (how to measure success)

4. Create GitHub repo

cd ~/Github\ repos/my-research-project
gh repo create EduardPetraeus/my-research-project --public --source .
git push -u origin main

5. Run first session manually

deep-research run --repo ~/Github\ repos/my-research-project

6. Check status

deep-research status --repo ~/Github\ repos/my-research-project

7. Set up cron for autonomous operation

crontab -e

Add (4 sessions per day):

0 0,6,12,18 * * * cd ~/Github\ repos/my-research-project && deep-research run >> logs/cron.log 2>&1

8. Walk away

Return in 30 days. Check:

  • ORCHESTRATION_LOG.md -- session success rates
  • STATUS.md -- current phase and benchmark score
  • HUMAN_REVIEW.md -- any stuck flags

CLI Reference

deep-research scaffold <name> [--path PATH] [--model MODEL]
deep-research run [--repo PATH] [--model MODEL] [--max-turns N]
deep-research status [--repo PATH]
deep-research --version

State Files

File Purpose
STATUS.md Session checkpoint. Phase, objective, resume flag
RESEARCH_STATE.md Accumulated validated findings (append-only)
EXPLORATION_LOG.md Per-session research log (append-only)
HUMAN_REVIEW.md Async communication channel
BENCHMARK.md Evaluation criteria (locked after init)
ORCHESTRATION_LOG.md Automated session tracking table

Configuration

Override via environment variables or CLI flags:

Variable Default Description
DEEP_RESEARCH_REPO ~/Github repos/deep-research-project Project path
DEEP_RESEARCH_MODEL claude-sonnet-4-5-20250929 Claude model
DEEP_RESEARCH_MAX_MINUTES 90 Max session duration
DEEP_RESEARCH_STUCK_THRESHOLD 3 Sessions before stuck flag
DEEP_RESEARCH_MAX_TURNS 60 Max Claude turns per session

Human-in-the-Loop

Write directives in HUMAN_REVIEW.md under ## Active Instructions:

PIVOT: Switch to investigating X instead of Y
INSTRUCTION: Run benchmark evaluation on current methods
APPROVED: Delete data/raw/old_dataset/
CONTINUE:

Development

/opt/homebrew/bin/python3.12 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/ -v
ruff check . && ruff format --check .

Requirements

  • macOS (designed for Mac Mini M4)
  • Claude Code CLI (claude in PATH)
  • Claude Code Max subscription (x20 recommended for autonomous sessions)
  • Python 3.9+
  • Git

License

MIT

About

Generic orchestration framework for autonomous Claude Code research agents. Token-limit resilient, Git-native state, human-in-the-loop.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages