Skip to content

fix(ingest): avoid re-ingesting 10s every cron cycle for single-instance jetstream#58

Merged
tompscanlan merged 4 commits into
flo-bit:mainfrom
tompscanlan:fix/jetstream-cycle-reconnect
Jun 15, 2026
Merged

fix(ingest): avoid re-ingesting 10s every cron cycle for single-instance jetstream#58
tompscanlan merged 4 commits into
flo-bit:mainfrom
tompscanlan:fix/jetstream-cycle-reconnect

Conversation

@tompscanlan

Copy link
Copy Markdown
Collaborator

Problem

@atcute/jetstream rolls the cursor back 10s on the first connect when given an array url. That is deliberate (subscription.js seeds #lastUsedUrl = '' for arrays): a pooled set of instances is selected from at random each connect, so a resumed cursor may have come from a different instance, and the 10s rollback absorbs clock skew between them. It is designed to fire once per session.

Contrail's cron ingestion rebuilds the JetstreamSubscription every cycle. So for a single-instance config (e.g. the default ["wss://jetstream1.us-east.bsky.network"]), that "first-connect" rollback fires on every cycle and redundantly re-delivers the last 10s of events each tick. On a quiet stream this shows up as a duplicate-in-cycle flood; on a busy firehose it's a continuous ~10s re-ingest overlap.

Fix

Add a jetstreamUrlOption(jetstreams) helper in contrail-base that maps the configured list onto @atcute's url shape by topology:

  • one instance → a string (one fixed instance, no skew, no rollback)
  • a real pool (2+) → an array (cross-instance rollback preserved)

Applied at both construction sites: the cron ingestEvents path and the runPersistent daemon.

Note: setting two jetstreams does not sidestep this — an array keeps the per-cycle rollback and, with genuine cross-instance skew, the rollback is then actually needed every cycle. A string for the single-instance case is the only shape that removes the redundant re-ingest.

Tests

  • jetstream-url-option.test.ts — the helper maps single→string, pool→array, empty→empty.
  • jetstream-single-url.test.tsingestEvents hands a single configured jetstream to @atcute as a string and a multi-URL config as an array.
  • jetstream-rollback-behavior.test.ts — a characterization test that drives the real JetstreamSubscription (mocking only the injected socket, not @atcute) and asserts array-url rolls the cursor back 10s while string-url does not. This pins the upstream behavior the fix relies on, so a future @atcute change fails here loudly instead of silently breaking the fix.

A changeset is included (patch).

@tompscanlan tompscanlan force-pushed the fix/jetstream-cycle-reconnect branch from 8adaebc to 2d0cc85 Compare June 15, 2026 13:00
…nce jetstream

@atcute/jetstream rolls the cursor back 10s on the first connect when given an
array url, to absorb clock skew across a pool of interchangeable instances that a
resumed cursor may have crossed. That rollback is meant to fire once per session.

Contrail's cron ingestion rebuilds the subscription every cycle, so for a
single-instance config the rollback fired on every cycle and redundantly
re-delivered the last 10s of events.

Add a jetstreamUrlOption helper that hands a one-element config to @ATCUTE as a
string (one fixed instance, no skew, no rollback) and leaves a real multi-instance
pool as an array so its cross-instance rollback is preserved. Apply it at both
subscription construction sites: the cron ingestEvents path and the persistent
daemon.

A characterization test drives the real JetstreamSubscription and asserts the
array-rolls-back / string-does-not behavior the fix depends on, so a future
@ATCUTE change surfaces here instead of silently breaking the fix.
The per-cycle reconnect warning claimed every reconnect 'picks a URL at
random and rolls cursor back 10s' — true only for multi-instance pools.
With a single fixed instance (jetstreamUrlOption collapses a one-element
pool to a string), reconnects resume on the same instance from the saved
cursor with no rollback, which the rolled_back=0us metric already shows.
Gate the warning on pool size, report the actual rolled_back value, and
log single-instance reconnects at info level.
@tompscanlan tompscanlan force-pushed the fix/jetstream-cycle-reconnect branch from 3061a4b to 4d9402d Compare June 15, 2026 22:38
@tompscanlan tompscanlan merged commit 9894787 into flo-bit:main Jun 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant