Hi, thanks for creating this interesting project! I'm trying to run metaclaw benchmark but meet some problems as follows
Wrong bench directory and extra argument
benchmark/README.md suggests
metaclaw-bench check data/metaclaw-bench/all_tests.json
but actually I need to run
metaclaw-bench check -p benchmark/data/metaclaw-bench/all_tests.json
The base directory of input path is not the current MetaClaw/benchmark, but MetaClaw
metaclaw-bench run usage is also updated, but doc is old
➜ metaclaw-bench run data/metaclaw-bench-small/all_tests.json --output results/
usage: metaclaw-bench run [-h] -i INPUT -o OUTPUT [-w WORKERS] [-n RETRY] [--scene-per-train SCENE_PER_TRAIN] [--memory] [--memory-proxy-port MEMORY_PROXY_PORT]
metaclaw-bench run: error: the following arguments are required: -i/--input
Hardcoded directory
Under MetaClaw/benchmark/scripts, many scripts are hardcoded to run under /home/xkaiwen
➜ grep -lrn "xkaiwen"
./openclaw_customize/llm-prompt-logger/index.ts
./scripts/dummy_run.py
./scripts/rl_only_run.py
./scripts/rl_only_memory_run.py
./scripts/memory_run.py
./scripts/baseline_run.py
./scripts/skills_only_run.py
./scripts/proxy_run.py
./scripts/rl_run.py
./scripts/skills_memory_run.py
./scripts/madmax_memory_run.py
Users have to replace all of them. Better use a BASE_DIR environment, or by default use the current directory + subdirectory path
Dummy run
To run dummy_run.py, env BENCHMARK_API_KEY is required, but as-is
# Use pre-built script
python scripts/dummy_run.py
# Manually run full pipeline (infer → score → report)
metaclaw start # start metaclaw proxy first
export BENCHMARK_BASE_URL=http://127.0.0.1:30000/v1
export BENCHMARK_MODEL=GPT-5.2
metaclaw-bench run data/metaclaw-bench/all_tests.json --output results/
Confusing env var
I'm confused about what BENCHMARK_MODEL should be, and how it's used
Can this benchmark serve as an RL training set?
The benchmark looks like a comprehensive coding task. I wonder how it's generated, and if it's possible to use it as an RL training set?
Slow interrupt stop
It stucks for a while after ctrl-c
2026-04-07 20:47:07,031 | INFO | metaclaw.launcher | [Launcher] signal 2 received — stopping …
I'd like to fix some easy problems if it's okay with you.
Hi, thanks for creating this interesting project! I'm trying to run metaclaw benchmark but meet some problems as follows
Wrong bench directory and extra argument
benchmark/README.mdsuggestsbut actually I need to run
The base directory of input path is not the current
MetaClaw/benchmark, butMetaClawmetaclaw-bench runusage is also updated, but doc is oldHardcoded directory
Under
MetaClaw/benchmark/scripts, many scripts are hardcoded to run under/home/xkaiwen➜ grep -lrn "xkaiwen" ./openclaw_customize/llm-prompt-logger/index.ts ./scripts/dummy_run.py ./scripts/rl_only_run.py ./scripts/rl_only_memory_run.py ./scripts/memory_run.py ./scripts/baseline_run.py ./scripts/skills_only_run.py ./scripts/proxy_run.py ./scripts/rl_run.py ./scripts/skills_memory_run.py ./scripts/madmax_memory_run.pyUsers have to replace all of them. Better use a
BASE_DIRenvironment, or by default use the current directory + subdirectory pathDummy run
To run
dummy_run.py, envBENCHMARK_API_KEYis required, but as-isConfusing env var
I'm confused about what
BENCHMARK_MODELshould be, and how it's usedCan this benchmark serve as an RL training set?
The benchmark looks like a comprehensive coding task. I wonder how it's generated, and if it's possible to use it as an RL training set?
Slow interrupt stop
It stucks for a while after ctrl-c
I'd like to fix some easy problems if it's okay with you.