Skip to content

fix: --from-stage without --output now finds existing run directory#217

Closed
dennis-lynch-nv wants to merge 2 commits intoaiming-lab:mainfrom
dennis-lynch-nv:fix/from-stage-run-dir
Closed

fix: --from-stage without --output now finds existing run directory#217
dennis-lynch-nv wants to merge 2 commits intoaiming-lab:mainfrom
dennis-lynch-nv:fix/from-stage-run-dir

Conversation

@dennis-lynch-nv
Copy link
Copy Markdown

@dennis-lynch-nv dennis-lynch-nv commented Apr 6, 2026

Summary

When using --from-stage without --output, the CLI generates a new empty run directory via _generate_run_id(). The StageContract.input_files check then fails immediately because prior stage artifacts (e.g., exp_plan.yaml from stage 9) don't exist in the fresh directory.

Fix

Extend the BUG-119 checkpoint-search logic (already working for --resume) to also apply when --from-stage is specified. One-line condition change:

# Before:
if resume and not output:

# After:
if (resume or from_stage_name) and not output:

Tests

5 new test cases in tests/test_from_stage_run_dir.py:

Test Description
test_from_stage_without_output_finds_existing_run Core fix — --from-stage finds existing run dir
test_from_stage_without_output_old_behavior_fails Proves the bug existed before fix
test_resume_still_works No regression for --resume
test_explicit_output_skips_search --output bypasses search
test_picks_newest_run_when_multiple_exist Multiple runs → picks newest
tests/test_from_stage_run_dir.py .....                     [100%]
tests/test_cli.py ...                                      [100%]
8 passed in 0.16s

Fixes #216

When using --from-stage without --output, the CLI generated a new empty
run directory. The StageContract input_files check then failed immediately
because prior stage artifacts (e.g., exp_plan.yaml) didn't exist in the
fresh directory.

This extends the BUG-119 checkpoint-search logic to also apply when
--from-stage is specified, so it finds the most recent matching run
directory with a checkpoint.

Fixes aiming-lab#216
5 test cases covering:
- --from-stage without --output finds existing run dir (the fix)
- old behavior would NOT find it (proving the bug)
- --resume still works (no regression)
- explicit --output skips search
- multiple runs: picks newest

All pass on Python 3.11.
Jiaaqiliu added a commit that referenced this pull request Apr 10, 2026
…217)

Extend BUG-119 checkpoint-search logic to also apply when --from-stage
is used without --output. Includes 5 new tests. Fixes #216.
Contributed by @dennis-lynch-nv.
@Jiaaqiliu Jiaaqiliu closed this Apr 10, 2026
@Jiaaqiliu
Copy link
Copy Markdown
Collaborator

Merged manually via cherry-pick in commit 7b57457. Thank you @dennis-lynch-nv for the fix and tests!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: --from-stage without --output creates empty run_dir, fails on input_files check

2 participants