Vultr Benchmark Orchestration

⚠️ Fair warning: This was vibe-coded from a plane using OpenClaw. It works, but don't expect enterprise-grade polish. PRs welcome, flames not so much. 🦀✈️

Instances are created with a model list baked into their userdata, run benchmarks autonomously, and self-destruct when done. Your laptop only needs to be running long enough to fire the vultr instance create calls (~10 seconds).

How It Works

Your laptop                    Vultr API              Vultr Instance
    |                              |                        |
    |-- instance create x N ------>|                        |
    |   (userdata = model list)    |                        |
    |<-- instance IDs -------------|                        |
    |  [laptop can close now]      |                        |
                                   |-- boots from snapshot ->|
                                                            |-- reads models from
                                                            |   metadata API
                                                            |-- uv run benchmark.py --register
                                                            |-- [Slack: started + claim URL]
                                                            |-- uv run benchmark.py --model ...
                                                            |-- uv run benchmark.py --model ...
                                                            |-- [Slack: done + result URLs]
                                                            |-- vultr instance delete $SELF

Each instance also schedules a safety-net self-destruct via at now + 5 hours at startup, so orphaned instances are cleaned up even if the runner crashes.

Running Benchmarks

uv run orchestrate_vultr.py --count 10

# Or override with an explicit list
uv run orchestrate_vultr.py --count 10 \
  --models \
  anthropic/claude-opus-4.5 \
  openai/gpt-4o \
  google/gemini-2.5-flash \
  ...

Models are distributed round-robin across instances (e.g. 30 models across 10 instances = 3 models per instance). The script exits as soon as all instances are created.

Options:

Option	Default	Description
`--models`	(optional)	Model IDs to benchmark (space-separated)
`--models-file`	`default-models.yml`	YAML file used when `--models` is not provided
`--count`	`1`	Number of instances; models distributed across them
`--region`	`atl`	Vultr region
`--plan`	`vc2-1c-2gb`	Vultr instance plan
`--snapshot`	(see VultrConfig in orchestrate_vultr.py)	Vultr snapshot ID — update after re-bootstrapping
`--ssh-keys`	`a4b8f6d9-...`	Vultr SSH key ID

Monitoring:

# Watch instances disappear as they finish
watch vultr instance list

# Tail logs on a running instance
ssh root@<ip> tail -f /var/log/bench-runner.log

# View systemd service output
ssh root@<ip> journalctl -u bench-runner -f

Bootstrapping a New Vultr Snapshot

See docs/bootstrapping-snapshot.md

Files

File	Purpose
`orchestrate_vultr.py`	Fire-and-forget launcher — creates instances and exits
`bench_runner.sh`	Runs on each instance; reads models from metadata, benchmarks, self-destructs
`bench-runner.service`	systemd unit that starts `bench_runner.sh` on first boot
`bootstrap_instance.sh`	One-shot setup script for building a new snapshot image
`setup_snapshot.sh`	Lighter alternative to bootstrap — installs just the runner files onto an existing instance
`create_instance.sh`	Convenience shell script with the full model list pre-filled
`utilities/delete_instances.sh`	Emergency cleanup: delete instances by ID
`utilities/reaper.sh`	Automated cleanup: delete stale instances older than TTL

Automated Cleanup (Reaper)

While instances self-destruct on completion, sometimes things go wrong. The reaper script provides an external safety net by deleting any bench-* instances older than a configurable TTL (default: 6 hours).

# Preview what would be deleted
./utilities/reaper.sh --dry-run

# Delete stale instances (default: older than 6 hours)
./utilities/reaper.sh

# Custom TTL (2 hours = 7200 seconds)
./utilities/reaper.sh --ttl 7200

Recommended cron setup (run every hour):

0 * * * * /path/to/pinchbench-scripts/utilities/reaper.sh >> /var/log/reaper.log 2>&1

This catches any instances that:

Failed to self-destruct due to crashes
Got stuck in a hung state
Had their at safety-net fail for any reason

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
docs		docs
utilities		utilities
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
bench-runner.service		bench-runner.service
bench_runner.sh		bench_runner.sh
bootstrap_instance.sh		bootstrap_instance.sh
create_instance.sh		create_instance.sh
default-models.yml		default-models.yml
orchestrate_vultr.py		orchestrate_vultr.py
pyproject.toml		pyproject.toml
run_parallel_batches.py		run_parallel_batches.py
run_parallel_fabric.py		run_parallel_fabric.py
servers.example.json		servers.example.json
setup_snapshot.sh		setup_snapshot.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vultr Benchmark Orchestration

How It Works

Running Benchmarks

Bootstrapping a New Vultr Snapshot

Files

Automated Cleanup (Reaper)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vultr Benchmark Orchestration

How It Works

Running Benchmarks

Bootstrapping a New Vultr Snapshot

Files

Automated Cleanup (Reaper)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages