fix(ingest): avoid re-ingesting 10s every cron cycle for single-instance jetstream#58
Merged
tompscanlan merged 4 commits intoJun 15, 2026
Conversation
8adaebc to
2d0cc85
Compare
…nce jetstream @atcute/jetstream rolls the cursor back 10s on the first connect when given an array url, to absorb clock skew across a pool of interchangeable instances that a resumed cursor may have crossed. That rollback is meant to fire once per session. Contrail's cron ingestion rebuilds the subscription every cycle, so for a single-instance config the rollback fired on every cycle and redundantly re-delivered the last 10s of events. Add a jetstreamUrlOption helper that hands a one-element config to @ATCUTE as a string (one fixed instance, no skew, no rollback) and leaves a real multi-instance pool as an array so its cross-instance rollback is preserved. Apply it at both subscription construction sites: the cron ingestEvents path and the persistent daemon. A characterization test drives the real JetstreamSubscription and asserts the array-rolls-back / string-does-not behavior the fix depends on, so a future @ATCUTE change surfaces here instead of silently breaking the fix.
The per-cycle reconnect warning claimed every reconnect 'picks a URL at random and rolls cursor back 10s' — true only for multi-instance pools. With a single fixed instance (jetstreamUrlOption collapses a one-element pool to a string), reconnects resume on the same instance from the saved cursor with no rollback, which the rolled_back=0us metric already shows. Gate the warning on pool size, report the actual rolled_back value, and log single-instance reconnects at info level.
3061a4b to
4d9402d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
@atcute/jetstreamrolls the cursor back 10s on the first connect when given an arrayurl. That is deliberate (subscription.jsseeds#lastUsedUrl = ''for arrays): a pooled set of instances is selected from at random each connect, so a resumed cursor may have come from a different instance, and the 10s rollback absorbs clock skew between them. It is designed to fire once per session.Contrail's cron ingestion rebuilds the
JetstreamSubscriptionevery cycle. So for a single-instance config (e.g. the default["wss://jetstream1.us-east.bsky.network"]), that "first-connect" rollback fires on every cycle and redundantly re-delivers the last 10s of events each tick. On a quiet stream this shows up as a duplicate-in-cycle flood; on a busy firehose it's a continuous ~10s re-ingest overlap.Fix
Add a
jetstreamUrlOption(jetstreams)helper incontrail-basethat maps the configured list onto@atcute'surlshape by topology:Applied at both construction sites: the cron
ingestEventspath and therunPersistentdaemon.Note: setting two jetstreams does not sidestep this — an array keeps the per-cycle rollback and, with genuine cross-instance skew, the rollback is then actually needed every cycle. A string for the single-instance case is the only shape that removes the redundant re-ingest.
Tests
jetstream-url-option.test.ts— the helper maps single→string, pool→array, empty→empty.jetstream-single-url.test.ts—ingestEventshands a single configured jetstream to@atcuteas a string and a multi-URL config as an array.jetstream-rollback-behavior.test.ts— a characterization test that drives the realJetstreamSubscription(mocking only the injected socket, not@atcute) and asserts array-url rolls the cursor back 10s while string-url does not. This pins the upstream behavior the fix relies on, so a future@atcutechange fails here loudly instead of silently breaking the fix.A changeset is included (patch).