Consolidate bundle respawn after credential change into one mutex-guarded helper#218
Open
mgoldsborough wants to merge 2 commits into
Open
Consolidate bundle respawn after credential change into one mutex-guarded helper#218mgoldsborough wants to merge 2 commits into
mgoldsborough wants to merge 2 commits into
Conversation
…le helper Both `set_user_config` and `configureBundle` were duplicating the remove-source / spawn-new-source sequence after a credential write. That duplication is the kind of thing that drifts: one site adds a log line or a check, the other doesn't, and the second self-service credential write produces a different runtime state than the first. Worse, the two paths could race against each other on the same `(serverName, wsId)`: two concurrent writes meant two concurrent removeSource / startBundleSource pairs with no ordering guarantee, and the surviving McpSource depended on whichever spawn finished last. Under load this manifested as the bundle "going dead" until the next manual restart. This PR introduces `BundleLifecycleManager.respawnBundle(serverName, wsId, registry)` as the single supported path, with: - A per-(serverName, wsId) promise-chain mutex so concurrent respawns serialize without blocking different keys. - Explicit guards: missing instance and protected bundle both return structured errors instead of throwing. - `transition(instance, 'starting' | 'running' | 'dead')` on every branch so the lifecycle state stays accurate. Callers now delegate: - `connector-tools.ts::respawnBundleAfterCredentialChange` becomes a one-liner over `lifecycle.respawnBundle`. - `system-tools.ts::configureBundle` drops its inline remove + start block and calls the same helper. The `eventSink` param is gone from the signature (events emit from inside the helper now). Tests (`test/unit/lifecycle-respawn.test.ts`): - Missing instance returns an error. - Protected bundle refuses respawn and source stays in registry. - `startBundleSource` failure marks instance `dead` and returns the error. - Two concurrent respawns for the same key serialize: a spy on `registry.removeSource` confirms `maxInFlight === 1`. - The lock map drops its key after the last queued op resolves (no leak under churn).
Follow-ups on PR review feedback: - Inject workDir through `BundleLifecycleManager`'s constructor instead of reading `process.env.NB_WORK_DIR` inside `runRespawn`. Runtime already has `resolveWorkDir(config)` at construction time; this keeps `lifecycle.ts` honest about its dependencies. The other pre-existing `process.env.NB_WORK_DIR` reads inside this file are unrelated to this PR and remain unchanged. - Inline `respawnBundleAfterCredentialChange` into its sole caller (`handleSetUserConfig`). The wrapper had one caller and an unused `_bundleName` parameter — it added indirection without leverage now that the real work lives in `lifecycle.respawnBundle`. - Improve the missing-instance error to be user-shaped rather than developer-shaped: "Bundle X is not installed in workspace Y. Install it first via manage_app install." instead of "No bundle instance for X in workspace Y". - Fix a stale comment: `runRespawn` transitions to `dead`, not `failed` (no `failed` state exists in `BundleState`). - Update the `protected` docstring on `BundleInstance` and its schema description to reflect that this PR makes `protected` gate respawn as well as uninstall (both are tear-down operations and the rationale — don't tear down mid-call — is identical). - Update the unit-test assertion for the new missing-instance message.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
BundleLifecycleManager.respawnBundle(serverName, wsId, registry)is now the single supported path to restart a running bundle after a workspace credential is written. Both credential-writing tool handlers (set_user_configviaconnector-tools.tsandconfigureBundleinsystem-tools.ts) delegate to it.(serverName, wsId)promise-chain mutex serializes concurrent respawns for the same key without blocking unrelated keys. The previous duplicated code paths could race onremoveSource/startBundleSourceand leave a workspace's bundle dead until manual restart.startBundleSourcefailure transitions todead; success transitions torunning.Why now
The lifecycle of "set credential → restart subprocess to pick up new env" was implemented twice with slightly different shapes. After diagnosing
tenant-ipsdi'sDATABASE_URLpropagation issue (the matching manifest fix shipped insynapse-db-query#7), the platform side was the remaining hygiene gap: same intent, two code paths, no concurrency guard.Test plan
bun run verify:static(format / lint / tsc strict / web tsc / cycles / bundle-transport / codegen) cleanbun run verify:test-unit— 246 unit + web tests pass, including 5 new tests intest/unit/lifecycle-respawn.test.ts:startBundleSourcefailure →deadremoveSourcespy confirmsmaxInFlight === 1database_urlforsynapse-db-querytwice rapidly; verify the bundle ends uprunningwith the latest credential applied (no need for a manual restart).