Harden autonomous-skill runner for resume, task isolation, and long-running sessions by jiangqilong41-ops · Pull Request #3 · feiskyer/codex-settings

jiangqilong41-ops · 2026-03-22T08:39:52Z

Summary

This PR hardens skills/autonomous-skill for real long-running, resumable workflows.

It fixes several correctness and operability issues in run-session.sh, updates the docs/prompts to match the actual behavior, and adds a regression script covering the most failure-prone paths.

What changed

resolve .autonomous/ under the workspace root
make task name generation safe for non-ASCII descriptions
validate runner arguments early
add per-task runner locking
add resume fallback when codex exec resume fails
stop inlining full task_list.md / progress.md into the CLI prompt
verify initializer output before declaring success
fix progress counting so only real Task N: items are counted
improve recent-task selection to ignore invalid entries
update docs and executor prompt to match actual runner behavior
add a regression script for the key failure paths

Validation

Ran locally:

bash -n skills/autonomous-skill/scripts/run-session.sh
bash -n skills/autonomous-skill/tests/run-session-regression.sh
bash skills/autonomous-skill/tests/run-session-regression.sh

Regression result:

Passed: 31
Failed: 0

feiskyer · 2026-03-22T13:38:29Z

Thanks for the enhancements. rm -rf is super dangerous, is it possible to avoid using it?

Copilot

Pull request overview

This PR hardens the skills/autonomous-skill session runner to better support long-running, resumable workflows by improving task isolation, state validation, and resume behavior, plus aligning prompts/docs with the updated runner behavior.

Changes:

Hardened run-session.sh with workspace-root .autonomous resolution, safer task-name generation/validation, per-task locking, resume fallback, and stricter task state validation/progress parsing.
Updated executor prompt and SKILL documentation to reflect the runner’s expected task formatting and operational guidance.
Added a bash regression test script covering key failure-prone paths (resume fallback, invalid state detection, task name edge cases, stale lock recovery).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
skills/autonomous-skill/scripts/run-session.sh	Core runner hardening: workspace root detection, locking, resume fallback, prompt rendering changes, and task_list/progress validation.
skills/autonomous-skill/templates/executor-prompt.md	Aligns task counting and operational guidance with the stricter task parsing/runner behavior.
skills/autonomous-skill/SKILL.md	Updates usage and documentation to match workspace-root behavior and safer git hygiene guidance.
skills/autonomous-skill/tests/run-session-regression.sh	Adds regression coverage for runner edge cases and common failure modes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-22T13:41:05Z

skills/autonomous-skill/tests/run-session-regression.sh

+    local case_name="stale-lock-recovery"
+    make_fake_codex "$TEST_ROOT/$case_name/bin" "success-with-files"
+    mkdir -p "$TEST_ROOT/$case_name/work/.autonomous/demo/.runner.lock"
+    printf '999999\n' > "$TEST_ROOT/$case_name/work/.autonomous/demo/.runner.lock/pid"


The stale-lock test hardcodes PID 999999. On some systems (or long-lived CI hosts) that PID could legitimately exist, causing the runner to treat the lock as active and making the test flaky. Consider choosing a PID guaranteed to be unused (e.g., probe with kill -0 in a loop to find a free PID) before writing the lock pid file.

Suggested change

printf '999999\n' > "$TEST_ROOT/$case_name/work/.autonomous/demo/.runner.lock/pid"

# Choose a PID that is guaranteed to be unused so the lock is truly stale.

local fake_pid

while :; do

# Pick a PID-like number in a typical user-space range.

fake_pid=$(( (RANDOM % 64512) + 1000 ))

# If no process has this PID, use it.

if ! kill -0 "$fake_pid" 2>/dev/null; then

break

fi

done

printf '%s\n' "$fake_pid" > "$TEST_ROOT/$case_name/work/.autonomous/demo/.runner.lock/pid"

Copilot · 2026-03-22T13:41:05Z

skills/autonomous-skill/scripts/run-session.sh

    echo ""
    echo "Options:"
-    echo "  --task-name <name>       Specify task name explicitly"
+    echo "  --task-name <name>       Specify task name explicitly (must start with A-Z a-z 0-9; safe chars: . _ -)"


--task-name help text says the name "must start with A-Z a-z 0-9", but validate_task_name currently allows leading underscores (and any allowed character except -/.). Align the help text with the actual validation, or tighten validation to require an alphanumeric first character to avoid confusing users and inconsistent task directory names.

Suggested change

echo " --task-name <name> Specify task name explicitly (must start with A-Z a-z 0-9; safe chars: . _ -)"

echo " --task-name <name> Specify task name explicitly (must not start with '.' or '-'; allowed chars: A-Z a-z 0-9 _ . -)"

Copilot · 2026-03-22T13:41:05Z

skills/autonomous-skill/scripts/run-session.sh

    CODEX_ARGS=(
        -c 'approval_policy="never"'
        -c "sandbox_mode=\"$sandbox_mode\""
        --skip-git-repo-check
        --json
+        -C "$EXECUTION_CWD"
    )


build_codex_args adds a -C "$EXECUTION_CWD" flag to every codex exec invocation. Since this flag is not used anywhere else in the repo, if a user's Codex CLI doesn't support -C the runner will fail immediately. Consider avoiding reliance on a nonstandard flag (e.g., run codex in a subshell after cd "$EXECUTION_CWD") or add a compatibility check/fallback.

Copilot · 2026-03-22T13:41:06Z

skills/autonomous-skill/scripts/run-session.sh

+    LAST_CODEX_EXIT=${PIPESTATUS[0]}
+    set -e
+


run_codex_capture captures only ${PIPESTATUS[0]} (Codex exit code) and ignores failures from tee/log persistence. If tee partially fails (e.g., disk full) Codex may appear successful while logs/session ID extraction are incomplete, which undermines resumability. Consider checking PIPESTATUS[1] as well (and failing the session) or writing output to the temp log without tee and then streaming it separately.

Suggested change

LAST_CODEX_EXIT=${PIPESTATUS[0]}

set -e

local codex_exit=${PIPESTATUS[0]}

local tee_exit=${PIPESTATUS[1]}

LAST_CODEX_EXIT=$codex_exit

set -e

if [ "$tee_exit" -ne 0 ]; then

print_error "Log persistence failed (tee exit code: $tee_exit) for task directory '$task_dir'"

return "$tee_exit"

fi

Copilot · 2026-03-22T13:41:06Z

skills/autonomous-skill/templates/executor-prompt.md


 ```bash
-git add .
+git add {TASK_DIR}/task_list.md {TASK_DIR}/progress.md [project-files...]


The git add ... [project-files...] line looks like a literal command inside a bash code block, but [project-files...] is not valid shell syntax and will break if copy/pasted. Consider converting it into a comment placeholder (e.g., # add your project files here) or providing a concrete example list of paths.

Suggested change

git add {TASK_DIR}/task_list.md {TASK_DIR}/progress.md [project-files...]

git add {TASK_DIR}/task_list.md {TASK_DIR}/progress.md src/ tests/

Harden autonomous-skill runner for resume and long-running sessions

75637e7

feiskyer requested a review from Copilot March 22, 2026 13:37

Copilot started reviewing on behalf of feiskyer March 22, 2026 13:37 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden autonomous-skill runner for resume, task isolation, and long-running sessions#3

Harden autonomous-skill runner for resume, task isolation, and long-running sessions#3
jiangqilong41-ops wants to merge 1 commit intofeiskyer:mainfrom
jiangqilong41-ops:codex/autonomous-skill-hardening

jiangqilong41-ops commented Mar 22, 2026

Uh oh!

feiskyer commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    printf '999999\n' > "$TEST_ROOT/$case_name/work/.autonomous/demo/.runner.lock/pid"
+    # Choose a PID that is guaranteed to be unused so the lock is truly stale.
+    local fake_pid
+    while :; do
+        # Pick a PID-like number in a typical user-space range.
+        fake_pid=$(( (RANDOM % 64512) + 1000 ))
+        # If no process has this PID, use it.
+        if ! kill -0 "$fake_pid" 2>/dev/null; then
+            break
+        fi
+    done
+    printf '%s\n' "$fake_pid" > "$TEST_ROOT/$case_name/work/.autonomous/demo/.runner.lock/pid"

	echo " --task-name <name> Specify task name explicitly (must start with A-Z a-z 0-9; safe chars: . _ -)"
	echo " --task-name <name> Specify task name explicitly (must not start with '.' or '-'; allowed chars: A-Z a-z 0-9 _ . -)"

-    LAST_CODEX_EXIT=${PIPESTATUS[0]}
-    set -e
+    local codex_exit=${PIPESTATUS[0]}
+    local tee_exit=${PIPESTATUS[1]}
+    LAST_CODEX_EXIT=$codex_exit
+    set -e
+    if [ "$tee_exit" -ne 0 ]; then
+        print_error "Log persistence failed (tee exit code: $tee_exit) for task directory '$task_dir'"
+        return "$tee_exit"
+    fi

	git add {TASK_DIR}/task_list.md {TASK_DIR}/progress.md [project-files...]
	git add {TASK_DIR}/task_list.md {TASK_DIR}/progress.md src/ tests/

Conversation

jiangqilong41-ops commented Mar 22, 2026

Summary

What changed

Validation

Uh oh!

feiskyer commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants