Skip to content

Commit ab96a82

Browse files
NathanFlurryMasterPtato
authored andcommitted
chore: fix driver test suite
1 parent 7ab128e commit ab96a82

File tree

118 files changed

+3855
-991
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

118 files changed

+3855
-991
lines changed
Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
# Driver Engine Static Test Order
2+
3+
This note breaks the `driver-engine.test.ts` suite into file-name groups for static-only debugging.
4+
5+
Scope:
6+
- `registry (static)` only
7+
- `client type (http)` only unless a specific bug points to inline client behavior
8+
- `encoding (bare)` only unless a specific bug points to CBOR or JSON
9+
- Exclude `agent-os` from the normal pass target
10+
- Exclude `dynamic-reload` from the static pass target
11+
12+
Checklist rules:
13+
- A checkbox is marked only when the entire `*.ts` file has been covered and is fully passing.
14+
- Do not check a file off just because investigation started.
15+
- Start with a single test name, not a whole file-group or suite label.
16+
- After one single test passes, grow scope within that same file until the entire file passes.
17+
- Do not start the next tracked file until the current file is fully passing.
18+
- If a widened file run fails, stop expanding scope and fix that same file before running anything from the next file.
19+
- Record average duration only after the full file is passing.
20+
- The filenames in this note are tracking labels only. `pnpm test ... -t` does not filter by `src/driver-test-suite/tests/<file>.ts`.
21+
- `driver-engine.test.ts` wires everything into nested `describe(...)` blocks, so filter by the description text from the suite, plus the static path text when needed: `registry (static)`, `client type (http)`, and `encoding (bare)`.
22+
23+
## How To Filter
24+
25+
Use `-t` against the `describe(...)` text, not the filename from this note.
26+
27+
Base command shape:
28+
29+
```bash
30+
cd rivetkit-typescript/packages/rivetkit
31+
pnpm test driver-engine.test.ts -t "registry \\(static\\).*client type \\(http\\).*encoding \\(bare\\).*<suite description text>"
32+
```
33+
34+
To narrow to one single test inside that suite, append a stable chunk of the test name:
35+
36+
```bash
37+
cd rivetkit-typescript/packages/rivetkit
38+
pnpm test driver-engine.test.ts -t "registry \\(static\\).*client type \\(http\\).*encoding \\(bare\\).*Actor Driver Tests.*should"
39+
```
40+
41+
Common suite-description mappings:
42+
- `actor-state.ts` -> `Actor State Tests`
43+
- `actor-schedule.ts` -> `Actor Schedule Tests`
44+
- `actor-sleep.ts` -> `Actor Sleep Tests`
45+
- `actor-sleep-db.ts` -> `Actor Sleep Database Tests`
46+
- `actor-lifecycle.ts` -> `Actor Lifecycle Tests`
47+
- `manager-driver.ts` -> `Manager Driver Tests`
48+
- `actor-conn.ts` -> `Actor Connection Tests`
49+
- `actor-conn-state.ts` -> `Actor Connection State Tests`
50+
- `conn-error-serialization.ts` -> `Connection Error Serialization Tests`
51+
- `access-control.ts` -> `access control`
52+
- `actor-vars.ts` -> `Actor Variables`
53+
- `actor-db.ts` -> `Actor Database (raw) Tests`, `Actor Database (drizzle) Tests`, or `Actor Database Lifecycle Cleanup Tests`
54+
- `raw-http.ts` -> `raw http`
55+
- `raw-http-request-properties.ts` -> `raw http request properties`
56+
- `raw-websocket.ts` -> `raw websocket`
57+
- `hibernatable-websocket-protocol.ts` -> `hibernatable websocket protocol`
58+
- `cross-backend-vfs.ts` -> `Cross-Backend VFS Compatibility Tests`
59+
- `actor-agent-os.ts` -> `Actor agentOS Tests`
60+
- `dynamic-reload.ts` -> `Dynamic Actor Reload Tests`
61+
- `actor-conn-status.ts` -> `Connection Status Changes`
62+
- `gateway-routing.ts` -> `Gateway Routing`
63+
- `lifecycle-hooks.ts` -> `Lifecycle Hooks`
64+
65+
Why this order:
66+
- The suite currently pays full per-test harness cost for every test:
67+
- fresh namespace
68+
- fresh runner config
69+
- fresh envoy/driver lifecycle
70+
- Cheap tests are mostly harness overhead
71+
- Slow tests are concentrated in sleep, sandbox, workflow, and DB stress categories
72+
- Wrapper suites that pull in sleep-heavy children should be treated as slow even if the wrapper filename looks generic
73+
- Files that use sleep/hibernation waits or `describe.sequential` should not stay in the fast block
74+
75+
## Fastest First
76+
77+
These are the best initial groups for static-only bring-up.
78+
79+
- [x] `manager-driver.ts` - avg ~10.3s/test over 16 tests, suite 15.1s
80+
- [x] `actor-conn.ts` - avg ~8.4s/test over 23 tests, suite 16.0s
81+
- [x] `actor-conn-state.ts` - avg ~9.3s/test over 8 tests, suite 9.9s
82+
- [x] `conn-error-serialization.ts` - avg ~8.2s/test over 2 tests, suite 8.2s
83+
- [x] `actor-destroy.ts` - avg ~9.8s/test over 10 tests, suite 10.2s
84+
- [x] `request-access.ts` - avg ~9.1s/test over 4 tests, suite 9.1s
85+
- [x] `actor-handle.ts` - avg ~7.7s/test over 12 tests, suite 8.3s
86+
- [x] `action-features.ts` - avg ~8.3s/test over 11 tests, suite 8.8s
87+
- [x] `access-control.ts` - avg ~8.5s/test over 8 tests, suite 8.8s
88+
- [x] `actor-vars.ts` - avg ~8.3s/test over 5 tests, suite 8.5s
89+
- [x] `actor-metadata.ts` - avg ~8.3s/test over 6 tests, suite 8.4s
90+
- [x] `actor-onstatechange.ts` - avg ~8.3s/test over 5 tests, suite 8.3s
91+
- [x] `actor-db.ts` - avg ~9.5s/test over 28 tests, suite 27.0s
92+
- [x] `actor-workflow.ts` - avg ~9.2s/test over 19 tests, suite 11.9s
93+
- [x] `actor-error-handling.ts` - avg ~8.5s/test over 7 tests, suite 8.5s
94+
- [x] `actor-queue.ts` - avg ~9.3s/test over 25 tests, suite 17.5s
95+
- [x] `actor-inline-client.ts` - avg ~9.0s/test over 5 tests, suite 9.8s
96+
- [x] `actor-kv.ts` - avg ~8.4s/test over 3 tests, suite 8.4s
97+
- [x] `actor-stateless.ts` - avg ~8.6s/test over 6 tests, suite 9.1s
98+
- [x] `raw-http.ts` - avg ~8.6s/test over 15 tests, suite 10.1s
99+
- [x] `raw-http-request-properties.ts` - avg ~8.5s/test over 16 tests, suite 9.9s
100+
- [x] `raw-websocket.ts` - avg ~8.9s/test over 13 tests, suite 11.1s
101+
- [x] `actor-inspector.ts` - avg ~9.6s/test over 20 tests, suite 12.1s
102+
- [x] `gateway-query-url.ts` - avg ~8.3s/test over 2 tests, suite 8.3s
103+
- [x] `actor-db-kv-stats.ts` - avg ~9.0s/test over 11 tests, suite 9.9s
104+
- [x] `actor-db-pragma-migration.ts` - avg ~8.8s/test over 4 tests, suite 9.0s
105+
- [x] `actor-state-zod-coercion.ts` - avg ~8.8s/test over 3 tests, suite 8.8s
106+
- [ ] `actor-conn-status.ts`
107+
- [ ] `gateway-routing.ts`
108+
- [ ] `lifecycle-hooks.ts`
109+
110+
## Slow End
111+
112+
These should be last because they are the most likely to dominate wall time.
113+
114+
- [x] `actor-state.ts` - avg ~9.0s/test over 3 tests, suite 9.1s
115+
- [x] `actor-schedule.ts` - avg ~9.9s/test over 4 tests, suite 9.9s
116+
- [ ] `actor-sleep.ts`
117+
- [ ] `actor-sleep-db.ts`
118+
- [ ] `actor-lifecycle.ts`
119+
- [ ] `actor-conn-hibernation.ts`
120+
- [ ] `actor-run.ts`
121+
- [ ] `actor-sandbox.ts`
122+
- [ ] `hibernatable-websocket-protocol.ts`
123+
- [ ] `cross-backend-vfs.ts`
124+
- [ ] `actor-db-stress.ts`
125+
126+
## Not In Static Pass
127+
128+
These should not block the static-only pass target.
129+
130+
- [ ] `actor-agent-os.ts`
131+
Explicitly allowed to skip for now.
132+
- [ ] `dynamic-reload.ts`
133+
Dynamic-only path.
134+
135+
## Files Present But Not Wired In `runDriverTests`
136+
137+
- [ ] `raw-http-direct-registry.ts` - intentionally commented out (blocked on gateway actor queries)
138+
- [ ] `raw-websocket-direct-registry.ts` - intentionally commented out (blocked on gateway actor queries)
139+
140+
## Suggested Static-Only Debugging Sequence
141+
142+
Use one single test at a time with `-t`, then grow scope within the same file only after that single test passes.
143+
144+
- [ ] Run one single test from the next unchecked file.
145+
- [ ] Fix the first failing single test before expanding scope.
146+
- [ ] After one test passes, widen to the rest of that file until the entire file passes.
147+
- [ ] Check the file off only after the entire file is passing.
148+
- [ ] After the fast block is clean, run the medium-cost block.
149+
- [ ] Run the slow-end block last.
150+
- [ ] Run `agent-os` separately only if explicitly needed.
151+
152+
## Example Commands
153+
154+
Run one tracked file-group by suite description:
155+
156+
```bash
157+
cd rivetkit-typescript/packages/rivetkit
158+
pnpm test driver-engine.test.ts -t "registry \\(static\\).*client type \\(http\\).*encoding \\(bare\\).*Actor Driver Tests"
159+
```
160+
161+
Run one single test inside that tracked file-group:
162+
163+
```bash
164+
cd rivetkit-typescript/packages/rivetkit
165+
pnpm test driver-engine.test.ts -t "registry \\(static\\).*client type \\(http\\).*encoding \\(bare\\).*Actor Driver Tests.*should create actors"
166+
```
167+
168+
Run a slow group explicitly by suite description:
169+
170+
```bash
171+
cd rivetkit-typescript/packages/rivetkit
172+
pnpm test driver-engine.test.ts -t "registry \\(static\\).*client type \\(http\\).*encoding \\(bare\\).*Actor Sleep Database Tests"
173+
```
174+
175+
Run sandbox only:
176+
177+
```bash
178+
cd rivetkit-typescript/packages/rivetkit
179+
pnpm test driver-engine.test.ts -t "registry \\(static\\).*client type \\(http\\).*encoding \\(bare\\).*Actor Sandbox Tests"
180+
```
181+
182+
## Evidence For Slow Ordering
183+
184+
Observed from the current full-run log:
185+
- cheap tests like raw HTTP property checks are roughly around 1 second end-to-end including teardown
186+
- sandbox tests are about 8.5 to 8.8 seconds each
187+
- sleep and sleep-db groups show repeated alarm/sleep cycles and are consistently the longest-running categories in the log
188+
- `actor-state.ts`, `actor-schedule.ts`, `actor-sleep.ts`, `actor-sleep-db.ts`, and `actor-lifecycle.ts` are all called directly from `mod.ts` and inherit the sleep-heavy cost profile
189+
- `actor-run.ts`, `actor-conn-hibernation.ts`, and `hibernatable-websocket-protocol.ts` all spend real time in sleep or hibernation waits
190+
- the suite-wide average is inflated by the repeated harness lifecycle and these slow categories

.gitattributes

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ engine/sdks/typescript/runner/** linguist-generated=false
1717
engine/sdks/typescript/test-runner/** linguist-generated=false
1818
engine/sdks/rust/data/** linguist-generated=false
1919
engine/sdks/rust/*-protocol/** linguist-generated=false
20+
engine/sdks/rust/envoy-client/** linguist-generated=false
2021
engine/sdks/schemas/** linguist-generated=false
2122
engine/docker/dev/** linguist-generated=true
2223
engine/docker/dev-host/** linguist-generated=true

CLAUDE.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,8 @@ cargo test -- --nocapture
6464
cargo clippy -- -W warnings
6565
```
6666

67+
- Ensure lefthook is installed and enabled for git hooks (`lefthook install`).
68+
6769
### Docker Development Environment
6870
```bash
6971
# Start the development environment with all services
@@ -292,7 +294,10 @@ let error_with_meta = ApiRateLimited { limit: 100, reset_at: 1234567890 }.build(
292294
- Connection pooling through `packages/common/pools/`
293295

294296
**Performance**
295-
- ALWAYS prefer a dedicated concurrency container like `scc::HashMap<_, _>` with its async api or `moka::Cache` over `Arc<Mutex<HashMap<_, _>>>`. `Arc<Mutex<_>>` is very slow for containers.
297+
- Never use `Mutex<HashMap<...>>` or `RwLock<HashMap<...>>`.
298+
- Use `scc::HashMap` (preferred), `moka::Cache` (for TTL/bounded), or `DashMap` for concurrent maps.
299+
- Use `scc::HashSet` instead of `Mutex<HashSet<...>>` for concurrent sets.
300+
- `scc` async methods do not hold locks across `.await` points. Use `entry_async` for atomic read-then-write.
296301

297302
### Code Style
298303
- Hard tabs for Rust formatting (see `rustfmt.toml`)

Cargo.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -490,6 +490,9 @@ members = [
490490
package = "rivet-util"
491491
path = "engine/packages/util"
492492

493+
[workspace.dependencies.rivet-util-serde]
494+
path = "engine/packages/util-serde"
495+
493496
[workspace.dependencies.rivet-util-id]
494497
path = "engine/packages/util-id"
495498

engine/CLAUDE.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,10 +33,6 @@ When changing a versioned VBARE schema, follow the existing migration pattern.
3333
- When adding fields to epoxy workflow state structs, mark them `#[serde(default)]` so Gasoline can replay older serialized state.
3434
- Epoxy integration tests that spin up `tests/common::TestCtx` must call `shutdown()` before returning.
3535

36-
## Concurrent containers
37-
38-
Never use `Mutex<HashMap<...>>` or `RwLock<HashMap<...>>`. Use `scc::HashMap` (preferred), `moka::Cache` (for TTL/bounded), or `DashMap`. Same for sets: use `scc::HashSet` instead of `Mutex<HashSet<...>>`. Note that `scc` async methods do not hold locks across `.await` points. Use `entry_async` for atomic read-then-write.
39-
4036
## Test snapshots
4137

4238
Use `test-snapshot-gen` to generate and load RocksDB snapshots of the full UDB KV store for migration and integration tests. Scenarios produce per-replica RocksDB checkpoints stored under `engine/packages/test-snapshot-gen/snapshots/` (git LFS tracked). In tests, use `test_snapshot::SnapshotTestCtx::from_snapshot("scenario-name")` to boot a cluster from snapshot data. See `docs-internal/engine/TEST_SNAPSHOTS.md` for the full guide.

engine/packages/guard/src/routing/pegboard_gateway/resolve_actor_query.rs

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,27 @@ async fn resolve_query_target_dc_label(
207207
}
208208

209209
fn serialize_actor_key(key: &[String]) -> Result<String> {
210-
serde_json::to_string(key).context("failed to serialize actor key")
210+
const EMPTY_KEY: &str = "/";
211+
const KEY_SEPARATOR: char = '/';
212+
213+
if key.is_empty() {
214+
return Ok(EMPTY_KEY.to_string());
215+
}
216+
217+
let mut escaped_parts = Vec::with_capacity(key.len());
218+
for part in key {
219+
if part.is_empty() {
220+
escaped_parts.push(String::from("\\0"));
221+
continue;
222+
}
223+
224+
let escaped = part
225+
.replace('\\', "\\\\")
226+
.replace(KEY_SEPARATOR, "\\/");
227+
escaped_parts.push(escaped);
228+
}
229+
230+
Ok(escaped_parts.join(EMPTY_KEY))
211231
}
212232

213233
fn is_duplicate_key_error(err: &anyhow::Error) -> bool {

engine/sdks/rust/envoy-client/src/connection.rs

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
use std::sync::Arc;
2+
use std::sync::atomic::Ordering;
23

34
use futures_util::{SinkExt, StreamExt};
45
use rivet_envoy_protocol as protocol;
@@ -22,6 +23,11 @@ async fn connection_loop(shared: Arc<SharedContext>) {
2223
let mut attempt = 0u32;
2324

2425
loop {
26+
if shared.shutting_down.load(Ordering::Acquire) {
27+
tracing::debug!("stopping reconnect loop because envoy is shutting down");
28+
return;
29+
}
30+
2531
let connected_at = std::time::Instant::now();
2632

2733
match single_connection(&shared).await {
@@ -51,6 +57,11 @@ async fn connection_loop(shared: Arc<SharedContext>) {
5157
attempt = 0;
5258
}
5359

60+
if shared.shutting_down.load(Ordering::Acquire) {
61+
tracing::debug!("skipping reconnect because envoy is shutting down");
62+
return;
63+
}
64+
5465
let delay = calculate_backoff(attempt, &BackoffOptions::default());
5566
tracing::info!(attempt, delay_ms = delay.as_millis() as u64, "reconnecting");
5667
tokio::time::sleep(delay).await;

engine/sdks/rust/envoy-client/src/envoy.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
use std::collections::HashMap;
22
use std::sync::Arc;
33
use std::sync::OnceLock;
4+
use std::sync::atomic::Ordering;
45

56
use rivet_envoy_protocol as protocol;
67
use tokio::sync::mpsc;
@@ -396,6 +397,7 @@ async fn handle_shutdown(ctx: &mut EnvoyContext) {
396397
return;
397398
}
398399
ctx.shutting_down = true;
400+
ctx.shared.shutting_down.store(true, Ordering::Release);
399401

400402
tracing::debug!("envoy received shutdown");
401403

engine/sdks/rust/envoy-client/src/handle.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
use std::sync::Arc;
2+
use std::sync::atomic::Ordering;
23

34
use rivet_envoy_protocol as protocol;
45

@@ -15,6 +16,7 @@ pub struct EnvoyHandle {
1516

1617
impl EnvoyHandle {
1718
pub fn shutdown(&self, immediate: bool) {
19+
self.shared.shutting_down.store(true, Ordering::Release);
1820
if immediate {
1921
let _ = self.shared.envoy_tx.send(ToEnvoyMessage::Stop);
2022
} else {

rivetkit-typescript/packages/rivetkit-native/index.d.ts

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,20 @@
33

44
/* auto-generated by NAPI-RS */
55

6+
export interface JsBindParam {
7+
kind: string
8+
intValue?: number
9+
floatValue?: number
10+
textValue?: string
11+
blobValue?: Buffer
12+
}
13+
export interface ExecuteResult {
14+
changes: number
15+
}
16+
export interface QueryResult {
17+
columns: Array<string>
18+
rows: Array<Array<any>>
19+
}
620
/** Open a native SQLite database backed by the envoy's KV channel. */
721
export declare function openDatabaseFromEnvoy(jsHandle: JsEnvoyHandle, actorId: string): Promise<JsNativeDatabase>
822
/** Configuration for starting the native envoy client. */
@@ -44,7 +58,12 @@ export declare function startEnvoySyncJs(config: JsEnvoyConfig, eventCallback: (
4458
/** Start the native envoy client asynchronously. */
4559
export declare function startEnvoyJs(config: JsEnvoyConfig, eventCallback: (event: any) => void): JsEnvoyHandle
4660
/** Native SQLite database handle exposed to JavaScript. */
47-
export declare class JsNativeDatabase { }
61+
export declare class JsNativeDatabase {
62+
run(sql: string, params?: Array<JsBindParam> | undefined | null): Promise<ExecuteResult>
63+
query(sql: string, params?: Array<JsBindParam> | undefined | null): Promise<QueryResult>
64+
exec(sql: string): Promise<QueryResult>
65+
close(): Promise<void>
66+
}
4867
/** Native envoy handle exposed to JavaScript via N-API. */
4968
export declare class JsEnvoyHandle {
5069
started(): Promise<void>
@@ -64,10 +83,10 @@ export declare class JsEnvoyHandle {
6483
kvDrop(actorId: string): Promise<void>
6584
restoreHibernatingRequests(actorId: string, requests: Array<HibernatingRequestEntry>): void
6685
sendHibernatableWebSocketMessageAck(gatewayId: Buffer, requestId: Buffer, clientMessageIndex: number): void
67-
startServerless(payload: Buffer): Promise<void>
68-
/** Send a message on an open WebSocket connection. */
69-
sendWsMessage(gatewayId: Buffer, requestId: Buffer, data: Buffer, binary: boolean): void
86+
/** Send a message on an open WebSocket connection identified by messageIdHex. */
87+
sendWsMessage(gatewayId: Buffer, requestId: Buffer, data: Buffer, binary: boolean): Promise<void>
7088
/** Close an open WebSocket connection. */
71-
closeWebsocket(gatewayId: Buffer, requestId: Buffer, code?: number | undefined | null, reason?: string | undefined | null): void
89+
closeWebsocket(gatewayId: Buffer, requestId: Buffer, code?: number | undefined | null, reason?: string | undefined | null): Promise<void>
90+
startServerless(payload: Buffer): Promise<void>
7291
respondCallback(responseId: string, data: any): Promise<void>
7392
}

0 commit comments

Comments
 (0)