Vultr Benchmark Orchestration

⚠️ Fair warning: This was vibe-coded from a plane using OpenClaw. It works, but don't expect enterprise-grade polish. PRs welcome, flames not so much. 🦀✈️

Instances are created with a model list baked into their userdata, run benchmarks autonomously, and self-destruct when done. Your laptop only needs to be running long enough to fire the vultr instance create calls (~10 seconds).

How It Works

Your laptop                    Vultr API              Vultr Instance
    |                              |                        |
    |-- instance create x N ------>|                        |
    |   (userdata = model list)    |                        |
    |<-- instance IDs -------------|                        |
    |  [laptop can close now]      |                        |
                                   |-- boots from snapshot ->|
                                                            |-- reads models from
                                                            |   metadata API
                                                            |-- uv run benchmark.py --register
                                                            |-- [Slack: started + claim URL]
                                                            |-- uv run benchmark.py --model ...
                                                            |-- uv run benchmark.py --model ...
                                                            |-- [Slack: done + result URLs]
                                                            |-- vultr instance delete $SELF

Each instance also schedules a safety-net self-destruct via at now + 5 hours at startup, so orphaned instances are cleaned up even if the runner crashes.

Running Benchmarks

uv run orchestrate_vultr.py --count 10

# Or override with an explicit list
uv run orchestrate_vultr.py --count 10 \
  --models \
  anthropic/claude-opus-4.5 \
  openai/gpt-4o \
  google/gemini-2.5-flash \
  ...

Models are distributed round-robin across instances (e.g. 30 models across 10 instances = 3 models per instance). The script exits as soon as all instances are created.

Options:

Option	Default	Description
`--models`	(optional)	Model IDs to benchmark (space-separated)
`--models-file`	`default-models.yml`	YAML file used when `--models` is not provided
`--count`	`1`	Number of instances; models distributed across them
`--region`	`atl`	Vultr region
`--plan`	`vc2-1c-2gb`	Vultr instance plan
`--snapshot`	(see VultrConfig in orchestrate_vultr.py)	Vultr snapshot ID — update after re-bootstrapping
`--ssh-keys`	`a4b8f6d9-...`	Vultr SSH key ID

Monitoring:

# Watch instances disappear as they finish
watch vultr instance list

# Tail logs on a running instance
ssh root@<ip> tail -f /var/log/bench-runner.log

# View systemd service output
ssh root@<ip> journalctl -u bench-runner -f

Bootstrapping a New Vultr Snapshot

See docs/bootstrapping-snapshot.md

Files

File	Purpose
`orchestrate_vultr.py`	Fire-and-forget launcher — creates instances and exits
`bench_runner.sh`	Runs on each instance; reads models from metadata, benchmarks, self-destructs
`bench-runner.service`	systemd unit that starts `bench_runner.sh` on first boot
`bootstrap_instance.sh`	One-shot setup script for building a new snapshot image
`setup_snapshot.sh`	Lighter alternative to bootstrap — installs just the runner files onto an existing instance
`create_instance.sh`	Convenience shell script with the full model list pre-filled
`utilities/delete_instances.sh`	Emergency cleanup: delete instances by ID
`utilities/reaper.sh`	Automated cleanup: delete stale instances older than TTL

Automated Cleanup (Reaper)

While instances self-destruct on completion, sometimes things go wrong. The reaper script provides an external safety net by deleting any bench-* instances older than a configurable TTL (default: 6 hours).

# Preview what would be deleted
./utilities/reaper.sh --dry-run

# Delete stale instances (default: older than 6 hours)
./utilities/reaper.sh

# Custom TTL (2 hours = 7200 seconds)
./utilities/reaper.sh --ttl 7200

Recommended cron setup (run every hour):

0 * * * * /path/to/pinchbench-scripts/utilities/reaper.sh >> /var/log/reaper.log 2>&1

This catches any instances that:

Failed to self-destruct due to crashes
Got stuck in a hung state
Had their at safety-net fail for any reason

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vultr Benchmark Orchestration

How It Works

Running Benchmarks

Bootstrapping a New Vultr Snapshot

Files

Automated Cleanup (Reaper)

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Vultr Benchmark Orchestration

How It Works

Running Benchmarks

Bootstrapping a New Vultr Snapshot

Files

Automated Cleanup (Reaper)