fix: harden inference onboarding auth and validation recovery#1136
fix: harden inference onboarding auth and validation recovery#1136
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds credential normalization and stricter secret-prompt masking/backspace handling; introduces interactive back/exit navigation across onboarding prompts; centralizes endpoint probing with a curl probe helper and failure classification; refactors validators to structured {ok, retry} results and implements interactive retry/recovery loops. Changes
Sequence DiagramsequenceDiagram
participant User
participant Onboard as Onboarding Flow
participant Cred as Credentials Manager
participant Probe as Endpoint Prober
participant Validator as Validator
participant Runner as Runner/Provider Apply
User->>Onboard: start / select provider
Onboard->>Cred: promptSecret() / prompt()
Cred->>Cred: normalizeCredentialValue()
Cred->>User: echo "*" per char, handle backspace
User->>Cred: provide API key or "back"/"exit"
Cred-->>Onboard: normalized key or navigation command
Onboard->>Probe: runCurlProbe(endpoint, credential)
Probe->>Validator: classifyFailure(result)
Validator-->>Onboard: { ok: true } or { ok: false, retry: "credential|model|endpoint", helpUrl? }
alt Validation fails
Onboard->>User: promptValidationRecovery()
User->>Onboard: retry / back / exit
Onboard->>Cred: re-prompt or navigate
else Validation succeeds
Onboard->>Runner: upsertProvider / apply model
Runner-->>Onboard: { ok / error }
alt Runner error and recoverable
Onboard->>User: promptValidationRecovery()
end
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
test/credentials.test.js (1)
48-64: Prefer behavior tests over source-string assertions.These checks lock the suite to the current
bin/lib/credentials.jsimplementation (while (true), exact messages, and specificoutput.write()calls). A harmless refactor will fail the tests even if prompting, retrying, and masking still work.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/credentials.test.js` around lines 48 - 64, The tests currently assert exact source strings in bin/lib/credentials.js (e.g., matching "while (true)", exact prompt text, and specific output.write calls), which makes them brittle; instead, update the tests to exercise observable behavior: spawn or require the credentials module (use child_process.spawn or import exported functions like the prompt/validation and masking helpers if available) and assert that it (1) prompts and retries on invalid keys by sending invalid then valid input and observing repeated prompts, and (2) masks secret input by sending keystrokes and asserting the terminal output shows masking characters (or that the masking helper returns masked output) while the returned/collected secret remains the unmasked value. Reference the credentials module and its prompt/validation/masking entry points rather than matching source-code strings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/onboard.js`:
- Around line 669-683: The retry branch for recovery.kind === "credential"
currently calls replaceNamedCredential(credentialEnv, ...) which only rejects
empty input and therefore lets malformed NVIDIA API keys (NVIDIA_API_KEY) be
accepted; update this path to preserve the NVIDIA-specific validation by calling
the existing ensureApiKey(...) (or performing the same 'nvapi-' prefix and
format checks) when credentialEnv corresponds to NVIDIA_API_KEY (or when label
indicates NVIDIA), otherwise keep using replaceNamedCredential; ensure the
function names replaceNamedCredential and ensureApiKey and the credentialEnv
check are used to locate and modify the logic so the NVIDIA retry path enforces
the same validation as the initial prompt.
- Around line 2672-2676: The recovery handling after calling
promptValidationRecovery(...) treats any result other than "credential" or
"retry" as terminal and calls process.exit, which makes a "back" selection abort
onboarding; update the conditional around the variable retry (from
promptValidationRecovery) so that "back" is also treated as a non-terminal
choice (e.g., include retry === "back" in the branch that continues) and ensure
the same fix is applied to the duplicate block that handles provider/apply
recovery (the similar code around the other providerResult handling at the
second occurrence).
- Around line 485-491: The Anthropic normalization leaves a bare "/v1" behind
causing duplicate segments; update the suffix list used when computing pathname
in onboard.js so the Anthropic branch includes "/v1" (i.e., add "/v1" to the
array assigned to suffixes when flavor === "anthropic") so that
stripEndpointSuffix(...) will remove a trailing "/v1" and pathname normalization
won't produce "/v1/v1" on later probes.
- Around line 605-639: In runCurlProbe, handle spawnSync errors by checking
result.error before treating result.status as a successful exit; if result.error
is present (e.g., ETIMEDOUT or spawn failure) set ok=false, curlStatus to a
non-zero value (use result.error.code or 1), httpStatus=0, stderr to
result.error.message, and set message via
summarizeCurlFailure(result.error.code||1,
result.error.message||String(result.error)); only use result.status for normal
completions (avoid coercing null with result.status||0) and keep
summarizeProbeFailure for successful spawns where result.status is non-null.
---
Nitpick comments:
In `@test/credentials.test.js`:
- Around line 48-64: The tests currently assert exact source strings in
bin/lib/credentials.js (e.g., matching "while (true)", exact prompt text, and
specific output.write calls), which makes them brittle; instead, update the
tests to exercise observable behavior: spawn or require the credentials module
(use child_process.spawn or import exported functions like the prompt/validation
and masking helpers if available) and assert that it (1) prompts and retries on
invalid keys by sending invalid then valid input and observing repeated prompts,
and (2) masks secret input by sending keystrokes and asserting the terminal
output shows masking characters (or that the masking helper returns masked
output) while the returned/collected secret remains the unmasked value.
Reference the credentials module and its prompt/validation/masking entry points
rather than matching source-code strings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e3e611ec-a1d4-4825-916d-32b66fc6b0ef
📒 Files selected for processing (5)
bin/lib/credentials.jsbin/lib/onboard.jstest/credentials.test.jstest/onboard-selection.test.jstest/onboard.test.js
There was a problem hiding this comment.
♻️ Duplicate comments (4)
bin/lib/onboard.js (4)
485-491:⚠️ Potential issue | 🟠 MajorStrip a bare
/v1from Anthropic-compatible base URLs.A user-entered base ending in
/v1survives this normalization, so later probes hit.../v1/v1/messagesand.../v1/v1/models. Add"/v1"to the Anthropic suffix list here.🐛 Minimal fix
- ? ["/v1/messages", "/v1/models", "/messages", "/models"] + ? ["/v1/messages", "/v1/models", "/messages", "/models", "/v1"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 485 - 491, The Anthropic suffix list in the normalization logic (the suffixes array used before calling stripEndpointSuffix) is missing the bare "/v1", causing user-provided bases ending with "/v1" to remain and later produce duplicated segments; update the ternary branch that builds suffixes for flavor === "anthropic" (the variable named suffixes) to include "/v1" among ["/v1/messages", "/v1/models", "/v1", "/messages", "/models"] so stripEndpointSuffix can remove a trailing "/v1" and subsequent code (the pathname normalization and url.pathname assignment) will produce the correct base URL.
672-685:⚠️ Potential issue | 🟡 MinorPreserve NVIDIA-specific key validation on credential retry.
This branch now always calls
replaceNamedCredential(), which only rejects empty input. ForNVIDIA_API_KEYthat bypasses theensureApiKey()format checks, so malformed values can be saved during recovery.🛡️ Minimal fix
- await replaceNamedCredential(credentialEnv, `${label} API key`, helpUrl); + if (credentialEnv === "NVIDIA_API_KEY") { + await ensureApiKey(); + } else { + await replaceNamedCredential(credentialEnv, `${label} API key`, helpUrl); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 672 - 685, When retrying credential entry we currently always call replaceNamedCredential(), which only rejects empty input and therefore lets malformed NVIDIA_API_KEY values bypass the original ensureApiKey validation; update the retry path in the recovery.kind === "credential" branch to preserve NVIDIA-specific validation by detecting the NVIDIA credential (e.g., credentialEnv === "NVIDIA_API_KEY" or label containing "NVIDIA") and invoking the same validation flow used elsewhere (call ensureApiKey or run the NVIDIA-specific validator before persisting) instead of the generic replaceNamedCredential, or pass a validator callback into replaceNamedCredential so it enforces the NVIDIA format and rejects malformed values.
613-629:⚠️ Potential issue | 🟠 MajorCheck
result.errorbefore coercingstatus.When
curltimes out or cannot spawn,spawnSync()reports that onresult.errorand leavesresult.statusnull. Theresult.status || 0fallback turns those transport failures intocurlStatus: 0/HTTP 0, so recovery falls through to the wrong branch.In Node.js v22, when child_process.spawnSync() times out or cannot spawn the child, what values are set on result.error, result.status, and result.signal?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 613 - 629, The spawnSync call's result may have result.error set when curl fails to spawn or times out, so update the post-spawn handling around the spawnSync result (the result variable used to compute ok, httpStatus, curlStatus, body, stderr, and message) to first check result.error and treat that as a transport failure: set curlStatus to a distinct non-0 value (e.g., -1 or use result.error.code), set httpStatus to 0, and ensure ok is false; pass the error details into summarizeProbeFailure and stderr instead of coercing result.status || 0 or parsing stdout as an HTTP code when result.error is present. Ensure references to result.error, result.status, result.stdout, curlStatus, httpStatus, ok, and summarizeProbeFailure are updated accordingly.
2680-2684:⚠️ Potential issue | 🟠 Major
backstill aborts from provider/apply recovery.
promptValidationRecovery()mapsbackto"selection", but both failure paths here still exit for anything except"credential"/"retry". That makes the new navigation option terminal instead of letting onboarding return to provider selection.Also applies to: 2702-2706
🧹 Nitpick comments (2)
bin/lib/onboard.js (1)
2230-2450: Break the provider/recovery state machines into smaller helpers.
setupNim()andsetupInference()are already well past the repo’s complexity cap, and the added retry/back/auth branches make control-flow bugs much easier to miss. Extract the per-provider validation/recovery cases so these top-level functions only coordinate state transitions.As per coding guidelines,
**/*.{js,ts,tsx}: Maintain cyclomatic complexity limit of 20 (target: ratchet down to 15) for all functions.Also applies to: 2667-2707
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 2230 - 2450, The provider selection loop in setupNim()/setupInference() is too large and must be split: extract per-provider validation and recovery logic into small helper functions (e.g., validateOpenAiLikeSelectionWithRetry, validateAnthropicSelectionWithRetry, validateCustomOpenAiLikeSelectionHelper, validateCustomAnthropicSelectionHelper) that each accept (remoteConfig, selected.key, endpointUrl, model, credentialEnv) and return a normalized { ok, api, retry } result; move navigation/back/exit handling into a tiny helper (e.g., handleNavigationChoice) so the top-level loop only orchestrates state transitions (select provider → call appropriate helper → handle retry/back/exit/continue), ensuring setupNim and setupInference drop below a cyclomatic complexity of 20 (target ~15) by delegating branches to these helpers and reusing shared pieces like getCredential/normalizeProviderBaseUrl/validateOpenAiLikeModel.test/onboard.test.js (1)
907-927: Prefer behavioral checks over source-regex tests here.These assertions only grep
onboard.jstext, so they break on harmless refactors and can still miss runtime regressions. The surrounding subprocess-based tests are the stronger pattern for validating curl timeout handling and provider-create noise suppression.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/onboard.test.js` around lines 907 - 927, Replace the fragile source-regex assertions in the two tests named "uses split curl timeout args and does not mislabel curl usage errors as timeouts" and "suppresses expected provider-create AlreadyExists noise when update succeeds" with behavioral checks that run the onboard.js binary and assert runtime behavior (e.g., spawn the subprocess/child_process for onboard.js, inspect stdout/stderr, exit codes, and subprocess error/exit properties) instead of grepping file text; remove the assert.match(...) checks that look for the literal strings and instead assert the observable outcomes (curl timeout handling and suppressed AlreadyExists noise) using the existing subprocess-based test patterns in the suite so refactors of onboard.js won't break the tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@bin/lib/onboard.js`:
- Around line 485-491: The Anthropic suffix list in the normalization logic (the
suffixes array used before calling stripEndpointSuffix) is missing the bare
"/v1", causing user-provided bases ending with "/v1" to remain and later produce
duplicated segments; update the ternary branch that builds suffixes for flavor
=== "anthropic" (the variable named suffixes) to include "/v1" among
["/v1/messages", "/v1/models", "/v1", "/messages", "/models"] so
stripEndpointSuffix can remove a trailing "/v1" and subsequent code (the
pathname normalization and url.pathname assignment) will produce the correct
base URL.
- Around line 672-685: When retrying credential entry we currently always call
replaceNamedCredential(), which only rejects empty input and therefore lets
malformed NVIDIA_API_KEY values bypass the original ensureApiKey validation;
update the retry path in the recovery.kind === "credential" branch to preserve
NVIDIA-specific validation by detecting the NVIDIA credential (e.g.,
credentialEnv === "NVIDIA_API_KEY" or label containing "NVIDIA") and invoking
the same validation flow used elsewhere (call ensureApiKey or run the
NVIDIA-specific validator before persisting) instead of the generic
replaceNamedCredential, or pass a validator callback into replaceNamedCredential
so it enforces the NVIDIA format and rejects malformed values.
- Around line 613-629: The spawnSync call's result may have result.error set
when curl fails to spawn or times out, so update the post-spawn handling around
the spawnSync result (the result variable used to compute ok, httpStatus,
curlStatus, body, stderr, and message) to first check result.error and treat
that as a transport failure: set curlStatus to a distinct non-0 value (e.g., -1
or use result.error.code), set httpStatus to 0, and ensure ok is false; pass the
error details into summarizeProbeFailure and stderr instead of coercing
result.status || 0 or parsing stdout as an HTTP code when result.error is
present. Ensure references to result.error, result.status, result.stdout,
curlStatus, httpStatus, ok, and summarizeProbeFailure are updated accordingly.
---
Nitpick comments:
In `@bin/lib/onboard.js`:
- Around line 2230-2450: The provider selection loop in
setupNim()/setupInference() is too large and must be split: extract per-provider
validation and recovery logic into small helper functions (e.g.,
validateOpenAiLikeSelectionWithRetry, validateAnthropicSelectionWithRetry,
validateCustomOpenAiLikeSelectionHelper, validateCustomAnthropicSelectionHelper)
that each accept (remoteConfig, selected.key, endpointUrl, model, credentialEnv)
and return a normalized { ok, api, retry } result; move navigation/back/exit
handling into a tiny helper (e.g., handleNavigationChoice) so the top-level loop
only orchestrates state transitions (select provider → call appropriate helper →
handle retry/back/exit/continue), ensuring setupNim and setupInference drop
below a cyclomatic complexity of 20 (target ~15) by delegating branches to these
helpers and reusing shared pieces like
getCredential/normalizeProviderBaseUrl/validateOpenAiLikeModel.
In `@test/onboard.test.js`:
- Around line 907-927: Replace the fragile source-regex assertions in the two
tests named "uses split curl timeout args and does not mislabel curl usage
errors as timeouts" and "suppresses expected provider-create AlreadyExists noise
when update succeeds" with behavioral checks that run the onboard.js binary and
assert runtime behavior (e.g., spawn the subprocess/child_process for
onboard.js, inspect stdout/stderr, exit codes, and subprocess error/exit
properties) instead of grepping file text; remove the assert.match(...) checks
that look for the literal strings and instead assert the observable outcomes
(curl timeout handling and suppressed AlreadyExists noise) using the existing
subprocess-based test patterns in the suite so refactors of onboard.js won't
break the tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e663db14-a5b2-4f4f-983d-f6a8efb59926
📒 Files selected for processing (4)
bin/lib/credentials.jsbin/lib/onboard.jstest/credentials.test.jstest/onboard.test.js
✅ Files skipped from review due to trivial changes (1)
- test/credentials.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
- bin/lib/credentials.js
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (4)
bin/lib/onboard.js (4)
613-630:⚠️ Potential issue | 🟠 MajorHandle
spawnSync().errorbefore deriving probe status.Timeouts and spawn failures surface on
result.errorwithstatus === null;result.status || 0collapses both cases tocurlStatus: 0, soETIMEDOUT/ENOENTprobes get misreported as HTTP 0 and skip the transport-retry path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 613 - 630, The probe result handling after the spawnSync("curl", ...) call must detect and surface process errors via result.error before deriving OK/http status: in the block that returns the object (referencing spawnSync call, result, curlStatus, httpStatus, ok, and summarizeProbeFailure), check if result.error exists and set curlStatus to result.error.code || result.error.errno (or a non-zero sentinel) and ensure ok is false when result.error is present; keep httpStatus as 0 in that case and include the result.error details (error.code/errno and result.stderr) in the message passed to summarizeProbeFailure so ETIMEDOUT/ENOENT are not collapsed into curlStatus 0 and will correctly trigger transport-retry logic.
2682-2686:⚠️ Potential issue | 🟠 MajorTreat
selection/modelrecovery results as non-terminal here.
promptValidationRecovery()can returnselection(includingback) andmodel, but both loops still exit for anything exceptcredential/retry. That makes provider create/update and finalinference setfailures terminal again instead of routing back up the onboarding flow.Also applies to: 2704-2708
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 2682 - 2686, The failure-handling branch for promptValidationRecovery currently only treats "credential" and "retry" as non-terminal, but promptValidationRecovery may return "selection", "back", or "model" which should also continue the onboarding flow; update the logic in the blocks that inspect retry (the branch using promptValidationRecovery(config.label, classifyApplyFailure(providerResult.message), resolvedCredentialEnv, config.helpUrl) and the equivalent block later) so that if retry is one of "credential", "retry", "selection", "back", or "model" you continue the loop, and only call process.exit(providerResult.status || 1) for other values.
485-491:⚠️ Potential issue | 🟠 MajorStrip a bare
/v1from Anthropic-compatible bases.
https://proxy.example.com/v1still survives normalization as/v1, and every Anthropic probe later appends/v1/messages//v1/models, so validation hits/v1/v1/....Possible fix
- ? ["/v1/messages", "/v1/models", "/messages", "/models"] + ? ["/v1/messages", "/v1/models", "/messages", "/models", "/v1"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 485 - 491, The Anthropic normalization misses a bare "/v1" path, causing later probes to double-append; update the suffixes used by stripEndpointSuffix so that when flavor === "anthropic" you include "/v1" (e.g., change the anthropic suffix array in the const suffixes assignment to include "/v1"), then ensure you still trim trailing slashes on url.pathname and set url.pathname = pathname || "/"; this will make stripEndpointSuffix(stripEndpointSuffix(url.pathname..., suffixes)) correctly normalize bases like "https://proxy.example.com/v1" before probes run.
672-685:⚠️ Potential issue | 🟡 MinorPreserve NVIDIA key validation on credential retry.
When
credentialEnv === "NVIDIA_API_KEY", this path now callsreplaceNamedCredential(), which only rejects empty input. That bypasses thenvapi-...checks enforced byensureApiKey()during the initial prompt.Possible fix
- await replaceNamedCredential(credentialEnv, `${label} API key`, helpUrl); + if (credentialEnv === "NVIDIA_API_KEY") { + await ensureApiKey(); + } else { + await replaceNamedCredential(credentialEnv, `${label} API key`, helpUrl); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 672 - 685, The retry path for credential recovery bypasses NVIDIA key format validation because when recovery.kind === "credential" and credentialEnv is set the code calls replaceNamedCredential(credentialEnv, ...) which only rejects empty input; modify this path so that when credentialEnv === "NVIDIA_API_KEY" you validate the entered key with the same nvapi- checks used by ensureApiKey (or call ensureApiKey) before accepting and returning "credential" (or pass a validator into replaceNamedCredential that enforces the nvapi- prefix and length checks); ensure you still allow empty-retry/back/exit flows and preserve label and helpUrl usage.
🧹 Nitpick comments (1)
bin/lib/onboard.js (1)
2184-2656: SplitsetupNim()by provider path.The new recovery loops push this function even further past the repo’s complexity limit. Extracting the hosted-provider, custom-endpoint, and local-provider branches into helpers would make the retry/navigation state machine much easier to reason about and test.
As per coding guidelines,
**/*.{js,ts,tsx}: Maintain cyclomatic complexity limit of 20 (target: ratchet down to 15) for all functions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 2184 - 2656, The setupNim function is too large/complex; split provider-specific paths into small helpers to reduce cyclomatic complexity: extract the hosted/provider branch (the logic under if (REMOTE_PROVIDER_CONFIG[selected.key]) including credential checks, model prompting and validateOpenAiLikeSelection) into a helper like handleHostedProviderSelection(selected, requestedModel, requestedProvider) that returns {model, provider, endpointUrl, credentialEnv, preferredInferenceApi} or navigation signals; extract the custom/anthropic-compatible endpoint handling into handleCustomEndpointSelection that normalizes endpointUrl and handles navigation back/exit; extract the local-provider branches (nim-local, ollama, install-ollama, vllm) into separate helpers handleNimLocal(gpu, requestedModel), handleOllamaFlow(requestedModel,gpu,runCapture), handleInstallOllamaFlow(...), and handleVllmFlow(...) that encapsulate their pulls/starts/validation and return the same selection result shape; keep selectionLoop and navigation intact by calling these helpers and acting on their return values (including signals like "back"/"retry"/"exit") so the main setupNim only orchestrates flow and logging while all validation and retry loops live in the new helper functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/onboard.js`:
- Around line 563-575: The error classification logic in onboard.js currently
checks for httpStatus 404/405 before testing the response string against
model-related patterns, causing model-miss responses to be misclassified as
endpoint issues; move the model regex check (/model.+not found|unknown
model|unsupported model|bad model/i) so it runs before the 404/405 branch (the
block that returns { kind: "endpoint", retry: "selection" }) and keep the
unauthorized/forbidden credential check as-is, ensuring that model-related
responses are classified as { kind: "model", retry: "model" } and only true
404/405 non-model endpoint errors fall through to endpoint selection.
- Around line 2503-2512: The local-provider branches only check validation.ok
and ignore the new recovery contract; update each place that calls
validateOpenAiLikeSelection (e.g., the "Local NVIDIA NIM" call that assigns
preferredInferenceApi) to first test validation.retry (and validation.back where
applicable) and, when set, jump back to the outer provider selector by executing
continue selectionLoop; only if not retry/back proceed to check validation.ok
and set preferredInferenceApi—apply the same change to the other call sites you
noted so validateOpenAiLikeSelection's retry/back paths are honored.
---
Duplicate comments:
In `@bin/lib/onboard.js`:
- Around line 613-630: The probe result handling after the spawnSync("curl",
...) call must detect and surface process errors via result.error before
deriving OK/http status: in the block that returns the object (referencing
spawnSync call, result, curlStatus, httpStatus, ok, and summarizeProbeFailure),
check if result.error exists and set curlStatus to result.error.code ||
result.error.errno (or a non-zero sentinel) and ensure ok is false when
result.error is present; keep httpStatus as 0 in that case and include the
result.error details (error.code/errno and result.stderr) in the message passed
to summarizeProbeFailure so ETIMEDOUT/ENOENT are not collapsed into curlStatus 0
and will correctly trigger transport-retry logic.
- Around line 2682-2686: The failure-handling branch for
promptValidationRecovery currently only treats "credential" and "retry" as
non-terminal, but promptValidationRecovery may return "selection", "back", or
"model" which should also continue the onboarding flow; update the logic in the
blocks that inspect retry (the branch using
promptValidationRecovery(config.label,
classifyApplyFailure(providerResult.message), resolvedCredentialEnv,
config.helpUrl) and the equivalent block later) so that if retry is one of
"credential", "retry", "selection", "back", or "model" you continue the loop,
and only call process.exit(providerResult.status || 1) for other values.
- Around line 485-491: The Anthropic normalization misses a bare "/v1" path,
causing later probes to double-append; update the suffixes used by
stripEndpointSuffix so that when flavor === "anthropic" you include "/v1" (e.g.,
change the anthropic suffix array in the const suffixes assignment to include
"/v1"), then ensure you still trim trailing slashes on url.pathname and set
url.pathname = pathname || "/"; this will make
stripEndpointSuffix(stripEndpointSuffix(url.pathname..., suffixes)) correctly
normalize bases like "https://proxy.example.com/v1" before probes run.
- Around line 672-685: The retry path for credential recovery bypasses NVIDIA
key format validation because when recovery.kind === "credential" and
credentialEnv is set the code calls replaceNamedCredential(credentialEnv, ...)
which only rejects empty input; modify this path so that when credentialEnv ===
"NVIDIA_API_KEY" you validate the entered key with the same nvapi- checks used
by ensureApiKey (or call ensureApiKey) before accepting and returning
"credential" (or pass a validator into replaceNamedCredential that enforces the
nvapi- prefix and length checks); ensure you still allow empty-retry/back/exit
flows and preserve label and helpUrl usage.
---
Nitpick comments:
In `@bin/lib/onboard.js`:
- Around line 2184-2656: The setupNim function is too large/complex; split
provider-specific paths into small helpers to reduce cyclomatic complexity:
extract the hosted/provider branch (the logic under if
(REMOTE_PROVIDER_CONFIG[selected.key]) including credential checks, model
prompting and validateOpenAiLikeSelection) into a helper like
handleHostedProviderSelection(selected, requestedModel, requestedProvider) that
returns {model, provider, endpointUrl, credentialEnv, preferredInferenceApi} or
navigation signals; extract the custom/anthropic-compatible endpoint handling
into handleCustomEndpointSelection that normalizes endpointUrl and handles
navigation back/exit; extract the local-provider branches (nim-local, ollama,
install-ollama, vllm) into separate helpers handleNimLocal(gpu, requestedModel),
handleOllamaFlow(requestedModel,gpu,runCapture), handleInstallOllamaFlow(...),
and handleVllmFlow(...) that encapsulate their pulls/starts/validation and
return the same selection result shape; keep selectionLoop and navigation intact
by calling these helpers and acting on their return values (including signals
like "back"/"retry"/"exit") so the main setupNim only orchestrates flow and
logging while all validation and retry loops live in the new helper functions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6cb4ce50-412e-47a0-8a3a-6fcf7b00f47c
📒 Files selected for processing (2)
bin/lib/onboard.jstest/onboard.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
- test/onboard.test.js
There was a problem hiding this comment.
♻️ Duplicate comments (6)
bin/lib/onboard.js (6)
742-755:⚠️ Potential issue | 🟡 MinorKeep NVIDIA key validation on the retry path.
When
credentialEnv === "NVIDIA_API_KEY", this branch bypassesensureApiKey()and falls back toreplaceNamedCredential(), which only rejects empty input.retrycan now persist obviously malformed NVIDIA keys that the initial prompt would reject.🐛 Minimal fix
if (choice === "" || choice === "retry") { - await replaceNamedCredential(credentialEnv, `${label} API key`, helpUrl); + if (credentialEnv === "NVIDIA_API_KEY") { + await ensureApiKey(); + } else { + await replaceNamedCredential(credentialEnv, `${label} API key`, helpUrl); + } return "credential"; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 742 - 755, The retry branch for credential recovery currently calls replaceNamedCredential(credentialEnv, `${label} API key`, helpUrl) which only rejects empty input and skips ensureApiKey() validation for NVIDIA_API_KEY; update the retry path inside the recovery.kind === "credential" branch so that when credentialEnv === "NVIDIA_API_KEY" you call ensureApiKey() (or run the same NVIDIA key validation used on initial prompt) before persisting the credential, falling back to replaceNamedCredential() only after validation succeeds; target the logic around the choice === "" || choice === "retry" handling to add this conditional validation.
633-643:⚠️ Potential issue | 🟠 MajorClassify model misses before the 404/405 endpoint fallback.
getProbeRecovery(..., { allowModelRetry: true })depends on model misses being tagged asmodel, but this ordering turns404/405 "model not found"responses intoselectionfirst. The compatible-endpoint flows then bounce users back to provider selection instead of re-prompting for the model.🐛 Minimal fix
if (httpStatus === 400) { return { kind: "model", retry: "model" }; } - if (httpStatus === 404 || httpStatus === 405) { - return { kind: "endpoint", retry: "selection" }; - } if (/unauthorized|forbidden|invalid api key|invalid_auth|permission/i.test(normalized)) { return { kind: "credential", retry: "credential" }; } if (/model.+not found|unknown model|unsupported model|bad model/i.test(normalized)) { return { kind: "model", retry: "model" }; } + if (httpStatus === 404 || httpStatus === 405) { + return { kind: "endpoint", retry: "selection" }; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 633 - 643, The classification for "model not found" should run before the 404/405 endpoint fallback so model misses are returned as { kind: "model", retry: "model" } (which getProbeRecovery(..., { allowModelRetry: true }) expects); move the /model.+not found|unknown model|unsupported model|bad model/i test to appear before the if (httpStatus === 404 || httpStatus === 405) branch in the error-classification logic inside getProbeRecovery (or the surrounding function that contains httpStatus and normalized), ensuring the model-regex branch is evaluated first so model-related errors are classified as "model" rather than "endpoint".
555-561:⚠️ Potential issue | 🟠 MajorStrip a bare
/v1from Anthropic-compatible base URLs.If the user enters
https://proxy.example.com/v1, this normalization leaves/v1intact, and the later Anthropic probes append another/v1/..., so validation goes to/v1/v1/messagesand/v1/v1/models.🐛 Minimal fix
- const suffixes = flavor === "anthropic" - ? ["/v1/messages", "/v1/models", "/messages", "/models"] + const suffixes = flavor === "anthropic" + ? ["/v1/messages", "/v1/models", "/messages", "/models", "/v1"] : ["/responses", "/chat/completions", "/completions", "/models"];🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 555 - 561, The normalization fails to remove a bare "/v1" for Anthropic-style URLs, causing probes to hit "/v1/v1/…"; update the suffix list used by stripEndpointSuffix when flavor === "anthropic" to include "/v1" (in addition to "/v1/messages", "/v1/models", "/messages", "/models") so that a base URL ending in "/v1" is stripped to "/" and url.pathname becomes "/" (ensuring url.origin is returned and subsequent probes do not double-up "/v1").
683-699:⚠️ Potential issue | 🟠 MajorHandle
spawnSync()failures before interpretingstatus.On timeout or spawn failure, Node reports the problem on
result.errorand leavesresult.statusnull. Coercing that withresult.status || 0turns those failures into a fake HTTP-0 path, so transport recovery never sees the real timeout/ENOENT condition.In Node.js 22.x, when `child_process.spawnSync()` times out or cannot spawn the child, is the failure reported on `result.error` while `result.status` is `null`?🐛 Minimal fix
const body = fs.existsSync(bodyFile) ? fs.readFileSync(bodyFile, "utf8") : ""; + if (result.error) { + const stderr = result.error.message || String(result.error); + const curlStatus = result.error.code === "ETIMEDOUT" ? 28 : 1; + return { + ok: false, + httpStatus: 0, + curlStatus, + body, + stderr, + message: summarizeCurlFailure(curlStatus, stderr, body), + }; + } const status = Number(String(result.stdout || "").trim()); return { ok: result.status === 0 && status >= 200 && status < 300, httpStatus: Number.isFinite(status) ? status : 0, - curlStatus: result.status || 0, + curlStatus: result.status ?? 0, body, stderr: String(result.stderr || ""), - message: summarizeProbeFailure(body, status || 0, result.status || 0, String(result.stderr || "")), + message: summarizeProbeFailure(body, status || 0, result.status ?? 0, String(result.stderr || "")), };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 683 - 699, The spawnSync() result must be checked for result.error before coercing result.status; modify the probe logic around the spawnSync("curl", ...) call to detect if result.error is present and, in that case, set ok=false, httpStatus=0, and set curlStatus to a predictable sentinel (e.g., result.error.errno, result.error.code, or -1) instead of using result.status || 0; also include result.error (its code/message) in stderr and pass it into summarizeProbeFailure so transport/retry logic can see timeouts/ENOENT rather than a fake HTTP-0. Reference the existing variables result, bodyFile, summarizeProbeFailure and the spawnSync call when making the change.
2752-2756:⚠️ Potential issue | 🟠 MajorDon’t terminate non-terminal recovery results in Step 4.
promptValidationRecovery()can returnselectionandmodelhere, but both branches stillprocess.exit()unless the answer iscredentialorretry. That makesbackfrom provider/apply recovery abort onboarding, and model-related apply failures still can’t unwind to Step 3.Also applies to: 2774-2778
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 2752 - 2756, The code currently calls process.exit for any promptValidationRecovery result other than "credential" or "retry", which aborts onboarding when promptValidationRecovery returns "selection" or "model"; change the logic around promptValidationRecovery (the call with args config.label, classifyApplyFailure(providerResult.message), resolvedCredentialEnv, config.helpUrl) so that when it returns "selection" or "model" you do NOT call process.exit but instead unwind the flow (e.g., loop back to the prior step/Step 3 or re-enter the selection/model handling logic) to allow recovery to proceed; preserve the existing behavior of continuing on "credential" or "retry" and only use providerResult.status (or exit) for truly terminal failures.
2573-2582:⚠️ Potential issue | 🟠 MajorHonor
validation.retryin the local-provider branches.These call sites still collapse every failed validation into a fixed control flow, so the new recovery contract is lost:
retryon local NIM/vLLM jumps back to provider selection, andbackfrom the Ollama recovery prompt just reopens the model picker instead of returning to selection.Also applies to: 2623-2633, 2665-2675, 2703-2712
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 2573 - 2582, The current local-provider call sites unconditionally do "continue selectionLoop" when validateOpenAiLikeSelection returns !ok, which ignores the recovery contract; update the validateOpenAiLikeSelection error handling (the blocks that call validateOpenAiLikeSelection for "Local NVIDIA NIM", vLLM and Ollama) to examine validation.retry (and any equivalent recovery flag) instead of always continuing: if validation.retry === 'retry' then continue selectionLoop to re-open provider selection, if validation.retry === 'back' then perform the "back" control flow (e.g. break/return to the previous menu or invoke the existing back-handling path) and otherwise follow the default retry behavior; apply the same change in the other similar call sites that set preferredInferenceApi after validation.
🧹 Nitpick comments (1)
bin/lib/onboard.js (1)
2271-2712: SplitsetupNim()into per-provider handlers.The added back/retry paths push this dispatcher even further into “state machine in one function” territory, which is exactly how misses like the local retry handling slipped through. Keeping only provider selection/orchestration here and moving each provider branch into a helper would make the new recovery behavior much easier to reason about and test.
As per coding guidelines,
**/*.{js,ts,tsx}: Maintain cyclomatic complexity limit of 20 (target: ratchet down to 15) for all functions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 2271 - 2712, This function has grown too large and mixes provider selection with per-provider state/retry logic; extract each provider branch into its own handler (e.g., handleNimLocal, handleOllama, handleInstallOllama, handleVllm, handleRemoteProvider) and move the NIM/start/validation loops out of the main selectionLoop; each handler should accept the necessary inputs (options, requestedModel, gpu, runCapture, etc.), perform provider-specific prompts/validation using existing helpers like validateOpenAiLikeSelection, validateAnthropicSelectionWithRetryMessage, prepareOllamaModel, nim.listModels/ pullNimImage/ startNimContainer, and return a standardized result (status: "ok"|"retry"|"back"|"exit", plus provider, credentialEnv, endpointUrl, model, preferredInferenceApi, nimContainer) so the outer selectionLoop only dispatches to handlers and acts on the returned status (continue selectionLoop, exit, or accept result), reducing cyclomatic complexity of the main function (selectionLoop/ setupNim) below the guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@bin/lib/onboard.js`:
- Around line 742-755: The retry branch for credential recovery currently calls
replaceNamedCredential(credentialEnv, `${label} API key`, helpUrl) which only
rejects empty input and skips ensureApiKey() validation for NVIDIA_API_KEY;
update the retry path inside the recovery.kind === "credential" branch so that
when credentialEnv === "NVIDIA_API_KEY" you call ensureApiKey() (or run the same
NVIDIA key validation used on initial prompt) before persisting the credential,
falling back to replaceNamedCredential() only after validation succeeds; target
the logic around the choice === "" || choice === "retry" handling to add this
conditional validation.
- Around line 633-643: The classification for "model not found" should run
before the 404/405 endpoint fallback so model misses are returned as { kind:
"model", retry: "model" } (which getProbeRecovery(..., { allowModelRetry: true
}) expects); move the /model.+not found|unknown model|unsupported model|bad
model/i test to appear before the if (httpStatus === 404 || httpStatus === 405)
branch in the error-classification logic inside getProbeRecovery (or the
surrounding function that contains httpStatus and normalized), ensuring the
model-regex branch is evaluated first so model-related errors are classified as
"model" rather than "endpoint".
- Around line 555-561: The normalization fails to remove a bare "/v1" for
Anthropic-style URLs, causing probes to hit "/v1/v1/…"; update the suffix list
used by stripEndpointSuffix when flavor === "anthropic" to include "/v1" (in
addition to "/v1/messages", "/v1/models", "/messages", "/models") so that a base
URL ending in "/v1" is stripped to "/" and url.pathname becomes "/" (ensuring
url.origin is returned and subsequent probes do not double-up "/v1").
- Around line 683-699: The spawnSync() result must be checked for result.error
before coercing result.status; modify the probe logic around the
spawnSync("curl", ...) call to detect if result.error is present and, in that
case, set ok=false, httpStatus=0, and set curlStatus to a predictable sentinel
(e.g., result.error.errno, result.error.code, or -1) instead of using
result.status || 0; also include result.error (its code/message) in stderr and
pass it into summarizeProbeFailure so transport/retry logic can see
timeouts/ENOENT rather than a fake HTTP-0. Reference the existing variables
result, bodyFile, summarizeProbeFailure and the spawnSync call when making the
change.
- Around line 2752-2756: The code currently calls process.exit for any
promptValidationRecovery result other than "credential" or "retry", which aborts
onboarding when promptValidationRecovery returns "selection" or "model"; change
the logic around promptValidationRecovery (the call with args config.label,
classifyApplyFailure(providerResult.message), resolvedCredentialEnv,
config.helpUrl) so that when it returns "selection" or "model" you do NOT call
process.exit but instead unwind the flow (e.g., loop back to the prior step/Step
3 or re-enter the selection/model handling logic) to allow recovery to proceed;
preserve the existing behavior of continuing on "credential" or "retry" and only
use providerResult.status (or exit) for truly terminal failures.
- Around line 2573-2582: The current local-provider call sites unconditionally
do "continue selectionLoop" when validateOpenAiLikeSelection returns !ok, which
ignores the recovery contract; update the validateOpenAiLikeSelection error
handling (the blocks that call validateOpenAiLikeSelection for "Local NVIDIA
NIM", vLLM and Ollama) to examine validation.retry (and any equivalent recovery
flag) instead of always continuing: if validation.retry === 'retry' then
continue selectionLoop to re-open provider selection, if validation.retry ===
'back' then perform the "back" control flow (e.g. break/return to the previous
menu or invoke the existing back-handling path) and otherwise follow the default
retry behavior; apply the same change in the other similar call sites that set
preferredInferenceApi after validation.
---
Nitpick comments:
In `@bin/lib/onboard.js`:
- Around line 2271-2712: This function has grown too large and mixes provider
selection with per-provider state/retry logic; extract each provider branch into
its own handler (e.g., handleNimLocal, handleOllama, handleInstallOllama,
handleVllm, handleRemoteProvider) and move the NIM/start/validation loops out of
the main selectionLoop; each handler should accept the necessary inputs
(options, requestedModel, gpu, runCapture, etc.), perform provider-specific
prompts/validation using existing helpers like validateOpenAiLikeSelection,
validateAnthropicSelectionWithRetryMessage, prepareOllamaModel, nim.listModels/
pullNimImage/ startNimContainer, and return a standardized result (status:
"ok"|"retry"|"back"|"exit", plus provider, credentialEnv, endpointUrl, model,
preferredInferenceApi, nimContainer) so the outer selectionLoop only dispatches
to handlers and acts on the returned status (continue selectionLoop, exit, or
accept result), reducing cyclomatic complexity of the main function
(selectionLoop/ setupNim) below the guideline.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f7a7da0c-04b2-43b2-9840-3e0c56600881
📒 Files selected for processing (2)
bin/lib/onboard.jstest/onboard.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
- test/onboard.test.js
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
bin/lib/onboard.js (1)
2794-2798:⚠️ Potential issue | 🟠 MajorBubble Step 4 recovery out of this loop.
Line 2795 and Line 2817 treat
"selection","back", and"model"like a plain retry bycontinue-ing the samewhile (true). That never returns tosetupNim(), so users still can’t actually change provider/model afteropenshell provideroropenshell inference setfailures. Only"credential"and"retry"should stay local here.Also applies to: 2816-2820
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 2794 - 2798, The loop handling promptValidationRecovery treats "selection", "back", and "model" the same as a local retry, preventing control from returning to setupNim; update the logic around promptValidationRecovery (the variables retry and providerResult.message used with classifyApplyFailure and promptValidationRecovery) so only "credential" and "retry" trigger a local continue, while "selection", "back", and "model" bubble out of this loop (e.g., break or return so setupNim can run again) and preserve the existing process.exit(providerResult.status||1) for other failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/onboard.js`:
- Around line 678-726: The runCurlProbe function is over complexity limits;
refactor by extracting the spawn+file handling and the two response-shaping
branches into small helpers: create a helper spawnCurlAndReadBody(argv,
bodyFile) that runs spawnSync("curl", ...), reads bodyFile, and returns an
object { result, body } (normalize errors there), and create two responders
formatCurlError(resultError, body) and formatCurlSuccess(result, body) that
build the return payloads currently produced in the error branch and success
branch (use the existing symbols compactText, summarizeProbeFailure,
summarizeCurlFailure when composing messages). Replace the inline branches in
runCurlProbe with calls to spawnCurlAndReadBody and then one-line returns from
formatCurlError/formatCurlSuccess so runCurlProbe's cyclomatic complexity drops
below the limit without changing behavior.
- Around line 2373-2389: The onboarding flow currently calls process.exit(1)
when normalizeProviderBaseUrl returns falsy for a custom OpenAI-compatible
endpoint, which aborts interactive onboarding on accidental blank input; update
the logic around endpointUrl (and the similar block later) to only hard-exit
when isNonInteractive() is true, and for interactive mode either re-prompt the
user or route back to provider selection (use getNavigationChoice/selectionLoop
or call exitOnboardFromPrompt as appropriate) so the user can retry instead of
the process terminating. Ensure the checks reference endpointUrl,
isNonInteractive(), getNavigationChoice, normalizeProviderBaseUrl, and
selectionLoop so the behavior differs between interactive and non-interactive
runs.
---
Duplicate comments:
In `@bin/lib/onboard.js`:
- Around line 2794-2798: The loop handling promptValidationRecovery treats
"selection", "back", and "model" the same as a local retry, preventing control
from returning to setupNim; update the logic around promptValidationRecovery
(the variables retry and providerResult.message used with classifyApplyFailure
and promptValidationRecovery) so only "credential" and "retry" trigger a local
continue, while "selection", "back", and "model" bubble out of this loop (e.g.,
break or return so setupNim can run again) and preserve the existing
process.exit(providerResult.status||1) for other failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 03fd22ad-3bf0-4e4e-bbef-99fc16e3ec45
📒 Files selected for processing (2)
bin/lib/onboard.jstest/onboard.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
- test/onboard.test.js
| function runCurlProbe(argv) { | ||
| const bodyFile = path.join(os.tmpdir(), `nemoclaw-curl-probe-${Date.now()}-${Math.random().toString(36).slice(2)}.json`); | ||
| try { | ||
| const args = [...argv]; | ||
| const url = args.pop(); | ||
| const result = spawnSync("curl", [...args, "-o", bodyFile, "-w", "%{http_code}", url], { | ||
| cwd: ROOT, | ||
| encoding: "utf8", | ||
| timeout: 30_000, | ||
| env: { | ||
| ...process.env, | ||
| }, | ||
| }); | ||
| const body = fs.existsSync(bodyFile) ? fs.readFileSync(bodyFile, "utf8") : ""; | ||
| if (result.error) { | ||
| const errorCode = result.error.code || result.error.errno || 1; | ||
| const errorMessage = compactText( | ||
| `${result.error.message || String(result.error)} ${String(result.stderr || "")}` | ||
| ); | ||
| return { | ||
| ok: false, | ||
| httpStatus: 0, | ||
| curlStatus: errorCode, | ||
| body, | ||
| stderr: errorMessage, | ||
| message: summarizeProbeFailure(body, 0, errorCode, errorMessage), | ||
| }; | ||
| } | ||
| const status = Number(String(result.stdout || "").trim()); | ||
| return { | ||
| ok: result.status === 0 && status >= 200 && status < 300, | ||
| httpStatus: Number.isFinite(status) ? status : 0, | ||
| curlStatus: result.status || 0, | ||
| body, | ||
| stderr: String(result.stderr || ""), | ||
| message: summarizeProbeFailure(body, status || 0, result.status || 0, String(result.stderr || "")), | ||
| }; | ||
| } catch (error) { | ||
| return { | ||
| ok: false, | ||
| httpStatus: 0, | ||
| curlStatus: error?.status || 1, | ||
| body: "", | ||
| stderr: error?.message || String(error), | ||
| message: summarizeCurlFailure(error?.status || 1, error?.message || String(error)), | ||
| }; | ||
| } finally { | ||
| fs.rmSync(bodyFile, { force: true }); | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Split runCurlProbe() before merge.
Line 678 is still over the repo limit and is already failing CI (complexity 25). Extract the spawn/error normalization and response-shaping branches into a couple of small helpers so this clears lint without changing behavior.
As per coding guidelines, **/*.{js,ts,tsx}: Maintain cyclomatic complexity limit of 20 (target: ratchet down to 15) for all functions.
🧰 Tools
🪛 GitHub Actions: pr
[error] 678-678: ESLint (JavaScript) failed. 'runCurlProbe' has a complexity of 25. Maximum allowed is 20 (complexity).
🪛 GitHub Check: lint
[failure] 678-678:
Function 'runCurlProbe' has a complexity of 25. Maximum allowed is 20
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bin/lib/onboard.js` around lines 678 - 726, The runCurlProbe function is over
complexity limits; refactor by extracting the spawn+file handling and the two
response-shaping branches into small helpers: create a helper
spawnCurlAndReadBody(argv, bodyFile) that runs spawnSync("curl", ...), reads
bodyFile, and returns an object { result, body } (normalize errors there), and
create two responders formatCurlError(resultError, body) and
formatCurlSuccess(result, body) that build the return payloads currently
produced in the error branch and success branch (use the existing symbols
compactText, summarizeProbeFailure, summarizeCurlFailure when composing
messages). Replace the inline branches in runCurlProbe with calls to
spawnCurlAndReadBody and then one-line returns from
formatCurlError/formatCurlSuccess so runCurlProbe's cyclomatic complexity drops
below the limit without changing behavior.
## Summary Smooth out inference configuration during `install.sh` / `nemoclaw onboard`, especially when provider authorization, credential formatting, endpoint probing, or final inference application fail. This PR makes the hosted-provider onboarding path recoverable instead of brittle: - normalize and safely handle credential input - classify validation failures more accurately - let users re-enter credentials in place - make final `openshell inference set` failures recoverable - normalize over-specified custom base URLs - add lower-level `back` / `exit` navigation so users can move up a level without restarting the whole install - clarify recovery prompts with explicit commands (`retry`, `back`, `exit`) ## What Changed - refactored provider probe execution to use direct `curl` argv invocation instead of `bash -c` - normalized credential values before use/persistence - added structured auth / transport / model / endpoint failure classification - added in-place credential re-entry for hosted providers: - NVIDIA Endpoints - OpenAI - Anthropic - Google Gemini - custom OpenAI-compatible endpoints - custom Anthropic-compatible endpoints - wrapped final provider/apply failures in interactive recovery instead of hard abort - added command-style recovery prompts: - `retry` - `back` - `exit` - allowed `back` from lower-level inference prompts (model entry, base URL entry, recovery prompts) - normalized custom endpoint inputs to the minimum usable base URL - removed stale `NVIDIA Endpoints (recommended)` wording - secret prompts now show masked `*` feedback while typing/pasting ## Validation ```bash npx vitest run test/credentials.test.js test/onboard-selection.test.js test/onboard.test.js npx vitest run test/cli.test.js npx eslint bin/lib/credentials.js bin/lib/onboard.js test/credentials.test.js test/onboard-selection.test.js test/onboard.test.js npx tsc -p jsconfig.json --noEmit ``` ## Issue Mapping Fully addressed in this PR: - Fixes #1099 - Fixes #1101 - Fixes #1130 Substantially addressed / partially addressed: - #987 - improves NVIDIA validation behavior and failure classification so false/misleading connectivity failures are much less likely, but this PR is framed as onboarding recovery hardening rather than a WSL-specific networking fix - #301 - improves graceful handling when validation/apply fails, especially for transport/upstream problems, but does not add provider auto-fallback or a broader cloud-outage fallback strategy - #446 - improves recovery specifically for the inference-configuration step, but does not fully solve general resumability across all onboarding steps Related implementation direction: - #890 - this PR aligns with the intent of safer/non-shell probe execution and clearer validation reporting - #380 - not implemented here; no automatic provider fallback was added in this branch ## Notes - This PR intentionally does not weaken validation or reopen old onboarding shortcuts. - Unrelated local `tmp/` noise was left out of the branch. Signed-off-by: Kevin Jones <kejones@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Interactive onboarding navigation (`back`/`exit`/`quit`) with credential re-prompting and retry flows. * Improved probe/validation flow with clearer recovery options and more robust sandbox build progress messages. * Secret input masks with reliable backspace behavior. * **Bug Fixes** * Credential sanitization (trim/line-ending normalization) and API key validation now normalize and retry instead of exiting. * Better classification and messaging for authorization/validation failures; retries where appropriate. * **Tests** * Expanded tests for credential prompts, masking, retry flows, validation classification, and onboarding navigation. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…#1136) ## Summary Smooth out inference configuration during `install.sh` / `nemoclaw onboard`, especially when provider authorization, credential formatting, endpoint probing, or final inference application fail. This PR makes the hosted-provider onboarding path recoverable instead of brittle: - normalize and safely handle credential input - classify validation failures more accurately - let users re-enter credentials in place - make final `openshell inference set` failures recoverable - normalize over-specified custom base URLs - add lower-level `back` / `exit` navigation so users can move up a level without restarting the whole install - clarify recovery prompts with explicit commands (`retry`, `back`, `exit`) ## What Changed - refactored provider probe execution to use direct `curl` argv invocation instead of `bash -c` - normalized credential values before use/persistence - added structured auth / transport / model / endpoint failure classification - added in-place credential re-entry for hosted providers: - NVIDIA Endpoints - OpenAI - Anthropic - Google Gemini - custom OpenAI-compatible endpoints - custom Anthropic-compatible endpoints - wrapped final provider/apply failures in interactive recovery instead of hard abort - added command-style recovery prompts: - `retry` - `back` - `exit` - allowed `back` from lower-level inference prompts (model entry, base URL entry, recovery prompts) - normalized custom endpoint inputs to the minimum usable base URL - removed stale `NVIDIA Endpoints (recommended)` wording - secret prompts now show masked `*` feedback while typing/pasting ## Validation ```bash npx vitest run test/credentials.test.js test/onboard-selection.test.js test/onboard.test.js npx vitest run test/cli.test.js npx eslint bin/lib/credentials.js bin/lib/onboard.js test/credentials.test.js test/onboard-selection.test.js test/onboard.test.js npx tsc -p jsconfig.json --noEmit ``` ## Issue Mapping Fully addressed in this PR: - Fixes NVIDIA#1099 - Fixes NVIDIA#1101 - Fixes NVIDIA#1130 Substantially addressed / partially addressed: - NVIDIA#987 - improves NVIDIA validation behavior and failure classification so false/misleading connectivity failures are much less likely, but this PR is framed as onboarding recovery hardening rather than a WSL-specific networking fix - NVIDIA#301 - improves graceful handling when validation/apply fails, especially for transport/upstream problems, but does not add provider auto-fallback or a broader cloud-outage fallback strategy - NVIDIA#446 - improves recovery specifically for the inference-configuration step, but does not fully solve general resumability across all onboarding steps Related implementation direction: - NVIDIA#890 - this PR aligns with the intent of safer/non-shell probe execution and clearer validation reporting - NVIDIA#380 - not implemented here; no automatic provider fallback was added in this branch ## Notes - This PR intentionally does not weaken validation or reopen old onboarding shortcuts. - Unrelated local `tmp/` noise was left out of the branch. Signed-off-by: Kevin Jones <kejones@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Interactive onboarding navigation (`back`/`exit`/`quit`) with credential re-prompting and retry flows. * Improved probe/validation flow with clearer recovery options and more robust sandbox build progress messages. * Secret input masks with reliable backspace behavior. * **Bug Fixes** * Credential sanitization (trim/line-ending normalization) and API key validation now normalize and retry instead of exiting. * Better classification and messaging for authorization/validation failures; retries where appropriate. * **Tests** * Expanded tests for credential prompts, masking, retry flows, validation classification, and onboarding navigation. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Smooth out inference configuration during
install.sh/nemoclaw onboard, especially when provider authorization, credential formatting, endpoint probing, or final inference application fail.This PR makes the hosted-provider onboarding path recoverable instead of brittle:
openshell inference setfailures recoverableback/exitnavigation so users can move up a level without restarting the whole installretry,back,exit)What Changed
curlargv invocation instead ofbash -cretrybackexitbackfrom lower-level inference prompts (model entry, base URL entry, recovery prompts)NVIDIA Endpoints (recommended)wording*feedback while typing/pastingValidation
Issue Mapping
Fully addressed in this PR:
Substantially addressed / partially addressed:
Related implementation direction:
Notes
tmp/noise was left out of the branch.Signed-off-by: Kevin Jones kejones@nvidia.com
Summary by CodeRabbit
New Features
back/exit/quit) with credential re-prompting and retry flows.Bug Fixes
Tests