Summary
benchflow eval create --agent <js-agent> --sandbox docker fails at the agent-install step for any task whose container image has dash as /bin/sh (e.g. an ubuntu:24.04 base). The Node/JS agent bootstrap script begins with set -o pipefail, which dash does not support, so the script aborts on line 1.
Repro
benchflow eval create --tasks-dir <dir with an ubuntu-based task> --agent gemini --sandbox docker
Result:
benchflow.models.AgentInstallError: Agent gemini install failed (rc=2)
agent/install-stdout.txt in the job dir shows:
$ set -o pipefail; export DEBIAN_FRONTEND=noninteractive; BF_NODE_DIR=/opt/benchflow/node; ...
sh: 1: set: Illegal option -o pipefail
Root cause
src/benchflow/agents/registry.py — the _NODE_INSTALL template starts with "set -o pipefail; ". The install command is executed in the sandbox under /bin/sh, which on Debian/Ubuntu images is dash; dash's set has no -o pipefail, so the whole script exits non-zero immediately.
Notably src/benchflow/sandbox/setup.py:108 already handles exactly this — shell = "/bin/bash" if "pipefail" in body else "/bin/sh" — but the agent-install path (agents/install.py) does not apply that bash-detection.
Suggested fix
Either:
- run the agent-install script with
/bin/bash when it contains bash-isms (reuse the setup.py:108 pattern), or
- drop
set -o pipefail from _NODE_INSTALL — the script already uses && chaining for error propagation, so pipefail is not load-bearing.
Environment
benchflow @ main, Docker 29.1.3, task base image ubuntu:24.04, host Ubuntu 22.04.
Summary
benchflow eval create --agent <js-agent> --sandbox dockerfails at the agent-install step for any task whose container image hasdashas/bin/sh(e.g. anubuntu:24.04base). The Node/JS agent bootstrap script begins withset -o pipefail, which dash does not support, so the script aborts on line 1.Repro
Result:
agent/install-stdout.txtin the job dir shows:Root cause
src/benchflow/agents/registry.py— the_NODE_INSTALLtemplate starts with"set -o pipefail; ". The install command is executed in the sandbox under/bin/sh, which on Debian/Ubuntu images isdash;dash'ssethas no-o pipefail, so the whole script exits non-zero immediately.Notably
src/benchflow/sandbox/setup.py:108already handles exactly this —shell = "/bin/bash" if "pipefail" in body else "/bin/sh"— but the agent-install path (agents/install.py) does not apply that bash-detection.Suggested fix
Either:
/bin/bashwhen it contains bash-isms (reuse thesetup.py:108pattern), orset -o pipefailfrom_NODE_INSTALL— the script already uses&&chaining for error propagation, sopipefailis not load-bearing.Environment
benchflow @
main, Docker 29.1.3, task base imageubuntu:24.04, host Ubuntu 22.04.