Harden autonomous-skill runner for resume, task isolation, and long-running sessions#3
Conversation
|
Thanks for the enhancements. |
There was a problem hiding this comment.
Pull request overview
This PR hardens the skills/autonomous-skill session runner to better support long-running, resumable workflows by improving task isolation, state validation, and resume behavior, plus aligning prompts/docs with the updated runner behavior.
Changes:
- Hardened
run-session.shwith workspace-root.autonomousresolution, safer task-name generation/validation, per-task locking, resume fallback, and stricter task state validation/progress parsing. - Updated executor prompt and SKILL documentation to reflect the runner’s expected task formatting and operational guidance.
- Added a bash regression test script covering key failure-prone paths (resume fallback, invalid state detection, task name edge cases, stale lock recovery).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| skills/autonomous-skill/scripts/run-session.sh | Core runner hardening: workspace root detection, locking, resume fallback, prompt rendering changes, and task_list/progress validation. |
| skills/autonomous-skill/templates/executor-prompt.md | Aligns task counting and operational guidance with the stricter task parsing/runner behavior. |
| skills/autonomous-skill/SKILL.md | Updates usage and documentation to match workspace-root behavior and safer git hygiene guidance. |
| skills/autonomous-skill/tests/run-session-regression.sh | Adds regression coverage for runner edge cases and common failure modes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| local case_name="stale-lock-recovery" | ||
| make_fake_codex "$TEST_ROOT/$case_name/bin" "success-with-files" | ||
| mkdir -p "$TEST_ROOT/$case_name/work/.autonomous/demo/.runner.lock" | ||
| printf '999999\n' > "$TEST_ROOT/$case_name/work/.autonomous/demo/.runner.lock/pid" |
There was a problem hiding this comment.
The stale-lock test hardcodes PID 999999. On some systems (or long-lived CI hosts) that PID could legitimately exist, causing the runner to treat the lock as active and making the test flaky. Consider choosing a PID guaranteed to be unused (e.g., probe with kill -0 in a loop to find a free PID) before writing the lock pid file.
| printf '999999\n' > "$TEST_ROOT/$case_name/work/.autonomous/demo/.runner.lock/pid" | |
| # Choose a PID that is guaranteed to be unused so the lock is truly stale. | |
| local fake_pid | |
| while :; do | |
| # Pick a PID-like number in a typical user-space range. | |
| fake_pid=$(( (RANDOM % 64512) + 1000 )) | |
| # If no process has this PID, use it. | |
| if ! kill -0 "$fake_pid" 2>/dev/null; then | |
| break | |
| fi | |
| done | |
| printf '%s\n' "$fake_pid" > "$TEST_ROOT/$case_name/work/.autonomous/demo/.runner.lock/pid" |
| echo "" | ||
| echo "Options:" | ||
| echo " --task-name <name> Specify task name explicitly" | ||
| echo " --task-name <name> Specify task name explicitly (must start with A-Z a-z 0-9; safe chars: . _ -)" |
There was a problem hiding this comment.
--task-name help text says the name "must start with A-Z a-z 0-9", but validate_task_name currently allows leading underscores (and any allowed character except -/.). Align the help text with the actual validation, or tighten validation to require an alphanumeric first character to avoid confusing users and inconsistent task directory names.
| echo " --task-name <name> Specify task name explicitly (must start with A-Z a-z 0-9; safe chars: . _ -)" | |
| echo " --task-name <name> Specify task name explicitly (must not start with '.' or '-'; allowed chars: A-Z a-z 0-9 _ . -)" |
| CODEX_ARGS=( | ||
| -c 'approval_policy="never"' | ||
| -c "sandbox_mode=\"$sandbox_mode\"" | ||
| --skip-git-repo-check | ||
| --json | ||
| -C "$EXECUTION_CWD" | ||
| ) |
There was a problem hiding this comment.
build_codex_args adds a -C "$EXECUTION_CWD" flag to every codex exec invocation. Since this flag is not used anywhere else in the repo, if a user's Codex CLI doesn't support -C the runner will fail immediately. Consider avoiding reliance on a nonstandard flag (e.g., run codex in a subshell after cd "$EXECUTION_CWD") or add a compatibility check/fallback.
| LAST_CODEX_EXIT=${PIPESTATUS[0]} | ||
| set -e | ||
|
|
There was a problem hiding this comment.
run_codex_capture captures only ${PIPESTATUS[0]} (Codex exit code) and ignores failures from tee/log persistence. If tee partially fails (e.g., disk full) Codex may appear successful while logs/session ID extraction are incomplete, which undermines resumability. Consider checking PIPESTATUS[1] as well (and failing the session) or writing output to the temp log without tee and then streaming it separately.
| LAST_CODEX_EXIT=${PIPESTATUS[0]} | |
| set -e | |
| local codex_exit=${PIPESTATUS[0]} | |
| local tee_exit=${PIPESTATUS[1]} | |
| LAST_CODEX_EXIT=$codex_exit | |
| set -e | |
| if [ "$tee_exit" -ne 0 ]; then | |
| print_error "Log persistence failed (tee exit code: $tee_exit) for task directory '$task_dir'" | |
| return "$tee_exit" | |
| fi |
|
|
||
| ```bash | ||
| git add . | ||
| git add {TASK_DIR}/task_list.md {TASK_DIR}/progress.md [project-files...] |
There was a problem hiding this comment.
The git add ... [project-files...] line looks like a literal command inside a bash code block, but [project-files...] is not valid shell syntax and will break if copy/pasted. Consider converting it into a comment placeholder (e.g., # add your project files here) or providing a concrete example list of paths.
| git add {TASK_DIR}/task_list.md {TASK_DIR}/progress.md [project-files...] | |
| git add {TASK_DIR}/task_list.md {TASK_DIR}/progress.md src/ tests/ |
Summary
This PR hardens
skills/autonomous-skillfor real long-running, resumable workflows.It fixes several correctness and operability issues in
run-session.sh, updates the docs/prompts to match the actual behavior, and adds a regression script covering the most failure-prone paths.What changed
.autonomous/under the workspace rootcodex exec resumefailstask_list.md/progress.mdinto the CLI promptTask N:items are countedValidation
Ran locally:
bash -n skills/autonomous-skill/scripts/run-session.shbash -n skills/autonomous-skill/tests/run-session-regression.shbash skills/autonomous-skill/tests/run-session-regression.shRegression result:
Passed: 31Failed: 0