Skip to content

fix(rivetkit): isolate engine envoys and propagate startup failures#4643

Open
NathanFlurry wants to merge 1 commit intotest/remove-wasm-sqlite-fixturesfrom
fix/isolate-engine-envoys
Open

fix(rivetkit): isolate engine envoys and propagate startup failures#4643
NathanFlurry wants to merge 1 commit intotest/remove-wasm-sqlite-fixturesfrom
fix/isolate-engine-envoys

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 13, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude bot commented Apr 13, 2026

PR Review: fix(rivetkit): isolate engine envoys and propagate startup failures

Two focused fixes: isolated per-driver-instance envoys (notGlobal: true) and proper startup failure propagation across all three layers (Rust, NAPI bridge, TypeScript).

What the PR does

  1. Envoy isolation (notGlobal: true): Each EngineActorDriver instance now creates its own envoy instead of sharing the process-wide global singleton. Prevents state cross-contamination when multiple driver instances coexist.

  2. Startup failure propagation: handle.started() previously returned () and silently swallowed RecvError (sender dropped = background task exited before signaling startup). The error now flows correctly through all three layers.

Layer-by-layer analysis

handle.rs: Core fix is correct. watch::Receiver::changed() returns Err(RecvError) only when all senders are dropped, which happens exactly when the background envoy task exits before sending the startup signal. The previous let _ = changed().await discarded this. The new anyhow::Result<()> return is the right shape.

envoy.rs: start_envoy uses .expect() rather than returning Result<EnvoyHandle>. This is a process-level panic on startup failure. If a failed envoy start is always fatal (which appears to be the case at current callsites), the panic is defensible and the message is clear. A comment noting the intentional panic would help future readers.

envoy_handle.rs (NAPI bridge): The two-level map_err chain is now correct. Previously the inner .map_err was mapping () from started() -> (), making it a dead no-op on JoinError. It now correctly converts the inner anyhow::Error from started() -> anyhow::Result<()>.

actor-driver.ts: The previous .then(onFulfilled) with no rejection handler left #envoyStarted permanently pending on failure. The new .then(resolve, reject) correctly handles both paths.

Potential issues

  • Envoy cleanup on driver destroy: notGlobal: true allocates a new envoy per driver instance. Confirm the envoy is shut down when the driver is destroyed (e.g., in a destroy/dispose lifecycle) to avoid goroutine/thread leaks in long-running processes that create many driver instances.

  • Fake channel in actor.rs: Line 120 creates an EnvoyHandle with tokio::sync::watch::channel(()).1 -- a receiver from a channel whose sender is immediately dropped. Calling started().await on this handle would immediately return Err(RecvError). The comment 'Fake channel, don't care' suggests this is intentional, but it is a landmine if new code calls started() on actor-context handles after this signature change.

  • No tests added: A test verifying that a startup failure surfaces as a rejected promise on the TypeScript side (rather than hanging indefinitely) would validate the propagation chain end-to-end.

Minor

  • PR description and checklist are empty -- worth filling in before merge for traceability.

Summary

The changes are correct and fix real, silent failure modes. The main open question is envoy cleanup on driver destroy to avoid leaks with the new per-instance envoy allocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant