refactor(cli): migrate onboard-session.js to TypeScript#1272
Conversation
…ript modules Move ~210 lines of pure, side-effect-free functions from the 3,800-line onboard.js into five typed TypeScript modules under src/lib/: - gateway-state.ts: gateway/sandbox state classification - validation.ts: failure classification, API key validation, model ID checks - url-utils.ts: URL normalization, text compaction, env formatting - build-context.ts: Docker build context filtering, recovery hints - dashboard.ts: dashboard URL resolution and construction Infrastructure: - tsconfig.src.json compiles src/ → dist/ as CJS (dist/ already gitignored) - tsconfig.cli.json updated to type-check src/ - npm run build:cli added to package.json - Pre-commit test-cli hook builds TS and includes dist/lib/ in coverage onboard.js imports from compiled dist/lib/ output. All 542 CLI tests pass. No user-facing behavior changes. Closes #1237. Relates to #924 (shell consolidation). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
56 new tests across three co-located test files: - src/lib/validation.test.ts: classifyValidationFailure, classifyApplyFailure, classifySandboxCreateFailure, validateNvidiaApiKeyValue, isSafeModelId - src/lib/dashboard.test.ts: resolveDashboardForwardTarget, buildControlUiUrls - src/lib/url-utils.test.ts: compactText, stripEndpointSuffix, normalizeProviderBaseUrl, isLoopbackHostname, formatEnvAssignment, parsePolicyPresetEnv vitest.config.ts updated to include src/**/*.test.ts in the CLI project. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The existing callsite in printDashboard calls buildControlUiUrls() with no arguments when no token is available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The tsc-js pre-push hook (jsconfig.json) type-checks bin/lib/onboard.js which requires dist/lib/ to exist. CI was not running npm run build:cli before the prek checks, causing TS2307 "Cannot find module" errors. - Add npm run build:cli step to .github/actions/basic-checks - Update tsc-js hook to run build:cli before tsc Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…law into cv/extract-onboard-pure-fns
## Summary - Convert `bin/lib/preflight.js` (357 lines) to `src/lib/preflight.ts` with full type definitions - Typed interfaces for all opts objects and return types: `PortProbeResult`, `MemoryInfo`, `SwapResult`, `CheckPortOpts`, `GetMemoryInfoOpts`, `EnsureSwapOpts` - Extract `parseLsofLines` helper to reduce duplication in `checkPortAvailable` - Incorporate #1227 fix: `sudo` → `sudo -n` (non-interactive) for lsof retry - `bin/lib/preflight.js` becomes a thin re-export shim — existing consumers unaffected - Co-locate tests: `test/preflight.test.js` → `src/lib/preflight.test.ts` - Add real net probe tests (EADDRINUSE detection on occupied ports) - Fix all co-located test imports to use `dist/` paths for coverage attribution - Add targeted dashboard/validation branch tests to maintain ratchet Stacked on #1240. Not touched by any #924 blocker PR. ## Test plan - [x] 612 CLI tests pass (601 existing + 11 new) - [x] `tsc -p tsconfig.src.json` compiles cleanly - [x] `tsc -p tsconfig.cli.json` type-checks cleanly - [x] `tsc -p jsconfig.json` type-checks cleanly (the pre-push check that caught the union issue) - [x] Coverage ratchet passes Relates to #924 (shell consolidation). Supersedes #1227. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Convert bin/lib/onboard-session.js (440 lines) to src/lib/onboard-session.ts with typed interfaces for Session, StepState, SessionFailure, LockResult, SessionUpdates, and related types. Full migration — all functions including fs I/O (session persistence, file-based locking) and pure helpers (redaction, validation, normalization). Co-locates tests. Cache-clearing pattern updated for dist/ path. 615 CLI tests pass. Coverage ratchet passes. Relates to #924 (shell consolidation). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughReorganized session management code from a JavaScript implementation in Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/onboard-session.test.ts (1)
19-35:⚠️ Potential issue | 🟡 MinorThe shared temp HOME makes these lock tests order-dependent.
These hooks reuse one
tmpDirfor the whole file, butreleaseOnboardLock()intentionally leaves malformed lock files behind. After the malformed-lock cases, later tests inherit that state because the hooks only clear the module cache and the session file. RecreatetmpDirper test, or wipe it in the hooks.One simple cleanup option
beforeEach(() => { + fs.rmSync(tmpDir, { recursive: true, force: true }); + fs.mkdirSync(tmpDir, { recursive: true, mode: 0o700 }); process.env.HOME = tmpDir; delete require.cache[shimPath]; delete require.cache[distPath]; session = require("../../dist/lib/onboard-session"); session.clearSession(); session.releaseOnboardLock(); }); afterEach(() => { delete require.cache[shimPath]; delete require.cache[distPath]; + fs.rmSync(tmpDir, { recursive: true, force: true }); if (originalHome === undefined) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard-session.test.ts` around lines 19 - 35, The tests reuse a single tmpDir across the file which causes order-dependent failures because session.releaseOnboardLock() leaves malformed lock files; update the beforeEach/afterEach hooks to create a fresh temporary directory for each test (or delete all files inside tmpDir) and set process.env.HOME to that fresh directory, ensuring you still delete require.cache for shimPath and distPath and call session.clearSession() and session.releaseOnboardLock() as before; target the beforeEach/afterEach functions and symbols tmpDir, process.env.HOME, session.clearSession(), and session.releaseOnboardLock() when applying the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/onboard-session.js`:
- Line 7: Replace the shim export in bin/lib/onboard-session.js (currently
"module.exports = require('../../dist/lib/onboard-session')") with the actual
CommonJS implementation so the CLI can run without a build; locate the logic in
dist/lib/onboard-session (or the source TS/JS equivalent), convert/inline that
implementation into bin/lib/onboard-session.js as a plain CommonJS module
(exports or module.exports) and remove any dependency on dist/, ensuring
bin/nemoclaw.js can require('./lib/onboard-session') in a fresh checkout/npm
install.
In `@src/lib/onboard-session.ts`:
- Around line 204-251: normalizeSession is too complex and currently bypasses
the complexity rule; split its responsibilities into small helpers. Create
helper functions such as coerceSessionFields(data: Record<string, unknown>) ->
Partial<Session> to handle top-level field coercion and redaction (use
redactUrl, sanitizeFailure), normalizeMetadata(metadata: unknown) ->
SessionMetadata|undefined, and normalizeSteps(steps: unknown, currentSteps:
Session['steps']) -> Session['steps'] to validate and populate individual step
entries (use validateStep and redactSensitiveText). Replace the inline logic in
normalizeSession to call these helpers, remove the eslint-disable-next-line
complexity waiver, and keep behavior identical (set normalized.resumable and
normalized.status as before). Ensure the new helpers are exported or file-scoped
so tests/other code can be updated if needed.
- Around line 336-365: The current stale-lock cleanup unlinks LOCK_FILE directly
which allows a race where a concurrent process recreates the lock and the slower
process then deletes it; change to an atomic stale-claim approach: when you
detect a stale lock (parseLockFile(...) returned existing and
isProcessAlive(...) is false), write a temporary file (e.g.,
`${LOCK_FILE}.${process.pid}.tmp`) with your PID/startedAt/command and then
attempt to atomically create the real lock using fs.linkSync(tempPath,
LOCK_FILE) (this call will succeed only if LOCK_FILE does not exist); if
linkSync succeeds you have acquired the lock (remove tempPath and return
acquired=true), if it fails because LOCK_FILE now exists remove tempPath and
continue retrying; do not call fs.unlinkSync(LOCK_FILE) unconditionally—only
remove stale files after successfully claiming the lock or when you can safely
determine nobody else replaced it. Use parseLockFile, LOCK_FILE, isProcessAlive,
and fs.linkSync/fs.writeFileSync/fs.unlinkSync to implement this.
- Around line 227-229: The metadata assignment currently copies gatewayName
regardless of type, allowing non-string values to persist; update the
normalization in the metadata block (used by loadSession/createSession flows) to
only set metadata.gatewayName when isObject(d.metadata) is true and (d.metadata
as Record<string, unknown>).gatewayName is a string (typeof === 'string'),
otherwise leave metadata undefined or omit gatewayName; ensure you reference the
existing isObject check and the SessionMetadata type so only string gatewayName
values are propagated.
---
Outside diff comments:
In `@src/lib/onboard-session.test.ts`:
- Around line 19-35: The tests reuse a single tmpDir across the file which
causes order-dependent failures because session.releaseOnboardLock() leaves
malformed lock files; update the beforeEach/afterEach hooks to create a fresh
temporary directory for each test (or delete all files inside tmpDir) and set
process.env.HOME to that fresh directory, ensuring you still delete
require.cache for shimPath and distPath and call session.clearSession() and
session.releaseOnboardLock() as before; target the beforeEach/afterEach
functions and symbols tmpDir, process.env.HOME, session.clearSession(), and
session.releaseOnboardLock() when applying the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 356d71e7-e42a-411b-862f-93c6eab61708
📒 Files selected for processing (3)
bin/lib/onboard-session.jssrc/lib/onboard-session.test.tssrc/lib/onboard-session.ts
| summarizeForDebug, | ||
| updateSession, | ||
| }; | ||
| module.exports = require("../../dist/lib/onboard-session"); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Tracked onboard-session entrypoints:"
git ls-files -- bin/lib/onboard-session.js dist/lib/onboard-session.js src/lib/onboard-session.ts
echo
echo "Packaging/build hooks that mention bin/dist:"
rg -n -C2 '"(bin|files|prepare|prepack|postinstall|build:cli)"|dist/lib/onboard-session|bin/lib/onboard-session' \
package.json README.md .npmignore .gitignore 2>/dev/null || trueRepository: NVIDIA/NemoClaw
Length of output: 1400
Remove the shim and keep the implementation in bin/lib/onboard-session.js.
The shim at line 7 delegates to dist/lib/onboard-session.js, which is not tracked in git and not included in the npm package. This breaks the bin/ contract that requires working without a build step—bin/nemoclaw.js will fail to load this module in a clean checkout or fresh npm install.
Either restore the implementation directly in bin/lib/onboard-session.js (as CommonJS) or restructure to avoid the dist/ dependency in the CLI path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bin/lib/onboard-session.js` at line 7, Replace the shim export in
bin/lib/onboard-session.js (currently "module.exports =
require('../../dist/lib/onboard-session')") with the actual CommonJS
implementation so the CLI can run without a build; locate the logic in
dist/lib/onboard-session (or the source TS/JS equivalent), convert/inline that
implementation into bin/lib/onboard-session.js as a plain CommonJS module
(exports or module.exports) and remove any dependency on dist/, ensuring
bin/nemoclaw.js can require('./lib/onboard-session') in a fresh checkout/npm
install.
| // eslint-disable-next-line complexity | ||
| export function normalizeSession(data: unknown): Session | null { | ||
| if (!isObject(data) || (data as Record<string, unknown>).version !== SESSION_VERSION) return null; | ||
| const d = data as Record<string, unknown>; | ||
| const normalized = createSession({ | ||
| sessionId: typeof d.sessionId === "string" ? d.sessionId : undefined, | ||
| mode: typeof d.mode === "string" ? d.mode : undefined, | ||
| startedAt: typeof d.startedAt === "string" ? d.startedAt : undefined, | ||
| updatedAt: typeof d.updatedAt === "string" ? d.updatedAt : undefined, | ||
| sandboxName: typeof d.sandboxName === "string" ? d.sandboxName : null, | ||
| provider: typeof d.provider === "string" ? d.provider : null, | ||
| model: typeof d.model === "string" ? d.model : null, | ||
| endpointUrl: typeof d.endpointUrl === "string" ? redactUrl(d.endpointUrl) : null, | ||
| credentialEnv: typeof d.credentialEnv === "string" ? d.credentialEnv : null, | ||
| preferredInferenceApi: | ||
| typeof d.preferredInferenceApi === "string" ? d.preferredInferenceApi : null, | ||
| nimContainer: typeof d.nimContainer === "string" ? d.nimContainer : null, | ||
| policyPresets: Array.isArray(d.policyPresets) | ||
| ? (d.policyPresets as unknown[]).filter((value) => typeof value === "string") as string[] | ||
| : null, | ||
| lastStepStarted: typeof d.lastStepStarted === "string" ? d.lastStepStarted : null, | ||
| lastCompletedStep: typeof d.lastCompletedStep === "string" ? d.lastCompletedStep : null, | ||
| failure: sanitizeFailure(d.failure as Record<string, unknown> | null), | ||
| metadata: isObject(d.metadata) | ||
| ? ({ gatewayName: (d.metadata as Record<string, unknown>).gatewayName } as SessionMetadata) | ||
| : undefined, | ||
| } as Partial<Session>); | ||
| normalized.resumable = d.resumable !== false; | ||
| normalized.status = typeof d.status === "string" ? d.status : normalized.status; | ||
|
|
||
| if (isObject(d.steps)) { | ||
| for (const [name, step] of Object.entries(d.steps as Record<string, unknown>)) { | ||
| if ( | ||
| Object.prototype.hasOwnProperty.call(normalized.steps, name) && | ||
| validateStep(step) | ||
| ) { | ||
| const s = step as Record<string, unknown>; | ||
| normalized.steps[name] = { | ||
| status: s.status as string, | ||
| startedAt: typeof s.startedAt === "string" ? s.startedAt : null, | ||
| completedAt: typeof s.completedAt === "string" ? s.completedAt : null, | ||
| error: redactSensitiveText(s.error), | ||
| }; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return normalized; |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Split normalizeSession instead of waiving the complexity check.
This deserializer is already doing schema coercion, redaction, defaults, and step normalization in one place. Please extract smaller helpers rather than landing the new complexity waiver here.
As per coding guidelines, **/*.{js,ts,tsx}: Maintain cyclomatic complexity limit of 20 (target: ratchet down to 15) for all functions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/onboard-session.ts` around lines 204 - 251, normalizeSession is too
complex and currently bypasses the complexity rule; split its responsibilities
into small helpers. Create helper functions such as coerceSessionFields(data:
Record<string, unknown>) -> Partial<Session> to handle top-level field coercion
and redaction (use redactUrl, sanitizeFailure), normalizeMetadata(metadata:
unknown) -> SessionMetadata|undefined, and normalizeSteps(steps: unknown,
currentSteps: Session['steps']) -> Session['steps'] to validate and populate
individual step entries (use validateStep and redactSensitiveText). Replace the
inline logic in normalizeSession to call these helpers, remove the
eslint-disable-next-line complexity waiver, and keep behavior identical (set
normalized.resumable and normalized.status as before). Ensure the new helpers
are exported or file-scoped so tests/other code can be updated if needed.
| metadata: isObject(d.metadata) | ||
| ? ({ gatewayName: (d.metadata as Record<string, unknown>).gatewayName } as SessionMetadata) | ||
| : undefined, |
There was a problem hiding this comment.
Validate metadata.gatewayName before copying it.
A persisted payload like { "metadata": { "gatewayName": 123 } } currently survives normalization. Because createSession() copies truthy gatewayName values through, loadSession() can return a Session whose metadata.gatewayName is not a string.
Suggested fix
- metadata: isObject(d.metadata)
- ? ({ gatewayName: (d.metadata as Record<string, unknown>).gatewayName } as SessionMetadata)
- : undefined,
+ metadata:
+ isObject(d.metadata) &&
+ typeof (d.metadata as Record<string, unknown>).gatewayName === "string"
+ ? { gatewayName: (d.metadata as Record<string, unknown>).gatewayName as string }
+ : undefined,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/onboard-session.ts` around lines 227 - 229, The metadata assignment
currently copies gatewayName regardless of type, allowing non-string values to
persist; update the normalization in the metadata block (used by
loadSession/createSession flows) to only set metadata.gatewayName when
isObject(d.metadata) is true and (d.metadata as Record<string,
unknown>).gatewayName is a string (typeof === 'string'), otherwise leave
metadata undefined or omit gatewayName; ensure you reference the existing
isObject check and the SessionMetadata type so only string gatewayName values
are propagated.
| let existing: LockInfo | null; | ||
| try { | ||
| existing = parseLockFile(fs.readFileSync(LOCK_FILE, "utf8")); | ||
| } catch (readError: unknown) { | ||
| if ((readError as NodeJS.ErrnoException)?.code === "ENOENT") { | ||
| continue; | ||
| } | ||
| throw readError; | ||
| } | ||
| if (!existing) { | ||
| continue; | ||
| } | ||
| if (existing && isProcessAlive(existing.pid)) { | ||
| return { | ||
| acquired: false, | ||
| lockFile: LOCK_FILE, | ||
| stale: false, | ||
| holderPid: existing.pid, | ||
| holderStartedAt: existing.startedAt, | ||
| holderCommand: existing.command, | ||
| }; | ||
| } | ||
|
|
||
| try { | ||
| fs.unlinkSync(LOCK_FILE); | ||
| } catch (unlinkError: unknown) { | ||
| if ((unlinkError as NodeJS.ErrnoException)?.code !== "ENOENT") { | ||
| throw unlinkError; | ||
| } | ||
| } |
There was a problem hiding this comment.
The stale-lock cleanup can let two processes acquire the lock.
If two contenders read the same stale file, the slower one can reach Line 360 after the faster one has already recreated LOCK_FILE, unlink that fresh lock, and then win on its next retry. That breaks the mutual-exclusion guarantee. Please use an atomic stale-lock claim strategy here, or switch this path to a proven lockfile implementation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/onboard-session.ts` around lines 336 - 365, The current stale-lock
cleanup unlinks LOCK_FILE directly which allows a race where a concurrent
process recreates the lock and the slower process then deletes it; change to an
atomic stale-claim approach: when you detect a stale lock (parseLockFile(...)
returned existing and isProcessAlive(...) is false), write a temporary file
(e.g., `${LOCK_FILE}.${process.pid}.tmp`) with your PID/startedAt/command and
then attempt to atomically create the real lock using fs.linkSync(tempPath,
LOCK_FILE) (this call will succeed only if LOCK_FILE does not exist); if
linkSync succeeds you have acquired the lock (remove tempPath and return
acquired=true), if it fails because LOCK_FILE now exists remove tempPath and
continue retrying; do not call fs.unlinkSync(LOCK_FILE) unconditionally—only
remove stale files after successfully claiming the lock or when you can safely
determine nobody else replaced it. Use parseLockFile, LOCK_FILE, isProcessAlive,
and fs.linkSync/fs.writeFileSync/fs.unlinkSync to implement this.
ericksoa
left a comment
There was a problem hiding this comment.
Faithful migration of the largest module in the series. Pre-existing issues tracked separately:
- #1281 — stale-lock race condition in acquireOnboardLock
- #1282 — normalizeSession cyclomatic complexity
- #1283 — metadata.gatewayName not type-validated
- #1284 — lock tests share tmpDir causing order dependence
5 behind main — needs a merge.
## Summary - Convert `bin/lib/onboard-session.js` (440 lines) to `src/lib/onboard-session.ts` - Typed interfaces: `Session`, `StepState`, `SessionFailure`, `LockResult`, `SessionUpdates`, `LockInfo`, `SessionMetadata` - Full migration including fs I/O (session persistence, file locking) and pure helpers (redaction, validation, normalization) - Co-locate tests (14 tests) with updated cache-clearing for dist/ path Stacked on #1240. 615 CLI tests pass. Coverage ratchet passes. Relates to #924 (shell consolidation). 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Reorganized onboarding session management module structure and improved distribution of the codebase. Session tracking, step-level state management, lock handling, and sensitive data redaction functionality have been consolidated and optimized for better maintainability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary - Convert `bin/lib/onboard-session.js` (440 lines) to `src/lib/onboard-session.ts` - Typed interfaces: `Session`, `StepState`, `SessionFailure`, `LockResult`, `SessionUpdates`, `LockInfo`, `SessionMetadata` - Full migration including fs I/O (session persistence, file locking) and pure helpers (redaction, validation, normalization) - Co-locate tests (14 tests) with updated cache-clearing for dist/ path Stacked on NVIDIA#1240. 615 CLI tests pass. Coverage ratchet passes. Relates to NVIDIA#924 (shell consolidation). 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Reorganized onboarding session management module structure and improved distribution of the codebase. Session tracking, step-level state management, lock handling, and sensitive data redaction functionality have been consolidated and optimized for better maintainability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
bin/lib/onboard-session.js(440 lines) tosrc/lib/onboard-session.tsSession,StepState,SessionFailure,LockResult,SessionUpdates,LockInfo,SessionMetadataStacked on #1240. 615 CLI tests pass. Coverage ratchet passes.
Relates to #924 (shell consolidation).
🤖 Generated with Claude Code
Summary by CodeRabbit