Skip to content

pinchbench/scripts

Repository files navigation

Vultr Benchmark Orchestration

⚠️ Fair warning: This was vibe-coded from a plane using OpenClaw. It works, but don't expect enterprise-grade polish. PRs welcome, flames not so much. 🦀✈️

Instances are created with a model list baked into their userdata, run benchmarks autonomously, and self-destruct when done. Your laptop only needs to be running long enough to fire the vultr instance create calls (~10 seconds).

How It Works

Your laptop                    Vultr API              Vultr Instance
    |                              |                        |
    |-- instance create x N ------>|                        |
    |   (userdata = model list)    |                        |
    |<-- instance IDs -------------|                        |
    |  [laptop can close now]      |                        |
                                   |-- boots from snapshot ->|
                                                            |-- reads models from
                                                            |   metadata API
                                                            |-- uv run benchmark.py --register
                                                            |-- [Slack: started + claim URL]
                                                            |-- uv run benchmark.py --model ...
                                                            |-- uv run benchmark.py --model ...
                                                            |-- [Slack: done + result URLs]
                                                            |-- vultr instance delete $SELF

Each instance also schedules a safety-net self-destruct via at now + 5 hours at startup, so orphaned instances are cleaned up even if the runner crashes.

Running Benchmarks

uv run orchestrate_vultr.py --count 10

# Or override with an explicit list
uv run orchestrate_vultr.py --count 10 \
  --models \
  anthropic/claude-opus-4.5 \
  openai/gpt-4o \
  google/gemini-2.5-flash \
  ...

Models are distributed round-robin across instances (e.g. 30 models across 10 instances = 3 models per instance). The script exits as soon as all instances are created.

Options:

Option Default Description
--models (optional) Model IDs to benchmark (space-separated)
--models-file default-models.yml YAML file used when --models is not provided
--count 1 Number of instances; models distributed across them
--region atl Vultr region
--plan vc2-1c-2gb Vultr instance plan
--snapshot (see VultrConfig in orchestrate_vultr.py) Vultr snapshot ID — update after re-bootstrapping
--ssh-keys a4b8f6d9-... Vultr SSH key ID

Monitoring:

# Watch instances disappear as they finish
watch vultr instance list

# Tail logs on a running instance
ssh root@<ip> tail -f /var/log/bench-runner.log

# View systemd service output
ssh root@<ip> journalctl -u bench-runner -f

Bootstrapping a New Vultr Snapshot

See docs/bootstrapping-snapshot.md


Files

File Purpose
orchestrate_vultr.py Fire-and-forget launcher — creates instances and exits
bench_runner.sh Runs on each instance; reads models from metadata, benchmarks, self-destructs
bench-runner.service systemd unit that starts bench_runner.sh on first boot
bootstrap_instance.sh One-shot setup script for building a new snapshot image
setup_snapshot.sh Lighter alternative to bootstrap — installs just the runner files onto an existing instance
create_instance.sh Convenience shell script with the full model list pre-filled
utilities/delete_instances.sh Emergency cleanup: delete instances by ID
utilities/reaper.sh Automated cleanup: delete stale instances older than TTL

Automated Cleanup (Reaper)

While instances self-destruct on completion, sometimes things go wrong. The reaper script provides an external safety net by deleting any bench-* instances older than a configurable TTL (default: 6 hours).

# Preview what would be deleted
./utilities/reaper.sh --dry-run

# Delete stale instances (default: older than 6 hours)
./utilities/reaper.sh

# Custom TTL (2 hours = 7200 seconds)
./utilities/reaper.sh --ttl 7200

Recommended cron setup (run every hour):

0 * * * * /path/to/pinchbench-scripts/utilities/reaper.sh >> /var/log/reaper.log 2>&1

This catches any instances that:

  • Failed to self-destruct due to crashes
  • Got stuck in a hung state
  • Had their at safety-net fail for any reason

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors