Skip to content

feat(ENG-5578): add integration tests with recorded market replay#127

Open
tvinagre wants to merge 45 commits into
mainfrom
tl/ENG-5578-integration-tests
Open

feat(ENG-5578): add integration tests with recorded market replay#127
tvinagre wants to merge 45 commits into
mainfrom
tl/ENG-5578-integration-tests

Conversation

@tvinagre

Copy link
Copy Markdown
Collaborator

Summary

  • Add record-market CLI tool that captures Tycho protocol stream updates and generates golden quote outputs by replaying through the full Fynd pipeline
  • Add integration test suite (fynd-core --test integration) that replays recorded fixtures and verifies solution availability, quality baselines, derived data, and timing
  • Worker pools and algorithm config loaded from worker_pools.toml to match production behavior
  • Golden file stores exact derived data metrics (spot prices, pool depths, token prices counts) for deterministic replay assertions
  • Shared golden types (GoldenFile, TestScenario, etc.) in fynd_core::recording::golden
  • parse_chain moved to fynd_core::types::constants for reuse across crates
  • Gas price captured from RPC as raw wei string (no lossy float conversion)
  • VM-backed protocol states (UniswapV4) filtered during recording serialization
  • Git LFS configured for compressed fixture files

Test plan

  • cargo +nightly fmt -- --check passes
  • cargo +nightly clippy --locked --all --all-features --all-targets -- -D warnings passes
  • cargo nextest run --workspace --locked --all-targets --all-features --bin fynd — 416 passed, 0 failed
  • All 8 integration tests pass (41/44 scenarios successful, 3 BNB pairs expected NoRouteFound)
  • CI runs on GitHub Actions

Generated with Claude Code

@tvinagre tvinagre force-pushed the tl/ENG-5578-integration-tests branch from 6725dc1 to 5f6eb60 Compare March 23, 2026 19:45
@github-actions

github-actions Bot commented Mar 23, 2026

Copy link
Copy Markdown

Breaking API Changes (Intentional)

Breaking API changes detected and declared in the PR title.
Ensure the minor version is bumped before merging (breaking changes on 0.x.x bump the minor).

semver-checks output
Checking fynd-core v0.78.1 -> v0.78.1 (no change; assume minor)
     Checked [   0.965s] 196 checks: 196 pass, 49 skip
     Summary no semver update required
    Finished [   1.267s] fynd-core
    Checking fynd-rpc v0.78.1 -> v0.78.1 (no change; assume minor)
     Checked [   0.111s] 196 checks: 195 pass, 1 fail, 0 warn, 49 skip

--- failure function_missing: pub fn removed or renamed ---

Description:
A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/function_missing.ron

Failed in:
  function fynd_rpc::builder::parse_chain, previously in file fynd-rpc/src/builder.rs:390

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [   0.175s] fynd-rpc
    Checking fynd-rpc-types v0.78.1 -> v0.78.1 (no change; assume minor)
     Checked [   0.190s] 196 checks: 196 pass, 49 skip
     Summary no semver update required
    Finished [   0.288s] fynd-rpc-types
    Checking fynd-client v0.78.1 -> v0.78.1 (no change; assume minor)
     Checked [   0.238s] 196 checks: 196 pass, 49 skip
     Summary no semver update required
    Finished [   0.375s] fynd-client
    Checking erc20-overrides v0.78.1 -> v0.78.1 (no change; assume minor)
     Checked [   0.084s] 196 checks: 196 pass, 49 skip
     Summary no semver update required
    Finished [   0.125s] erc20-overrides

Comment thread .github/workflows/nightly-canary.yaml
Comment thread fixtures/integration/README.md Outdated
Comment thread fixtures/integration/README.md Outdated
Comment thread fixtures/integration/README.md Outdated
Comment thread fynd-core/src/feed/mod.rs Outdated
Comment thread fynd-core/src/types/constants.rs Outdated
Comment thread fynd-core/tests/integration/derived_data_tests.rs Outdated
Comment thread fynd-core/tests/integration/solution_tests.rs
Comment thread tools/record-market/src/expected.rs
Comment thread tools/record-market/src/recorder.rs Outdated

@kayibal kayibal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review for now. Mostly concerned about the newly exposed api surface from this PR. I think a lot of this API would be better suited to be placed within tycho-simulation if exposed publically.

Comment thread tools/fynd-swap-cli/src/main.rs Outdated
Comment thread tools/record-market/Cargo.toml Outdated
Comment thread tools/record-market/src/recorder.rs Outdated
Comment thread tools/record-market/src/recorder.rs Outdated
Comment thread tools/record-market/src/recorder.rs Outdated
Comment thread fynd-core/src/recording/mod.rs Outdated
Comment thread fixtures/integration/README.md Outdated
Comment thread fynd-core/src/recording/golden.rs Outdated
Comment thread fynd-core/src/recording/golden.rs Outdated
Comment thread fynd-core/src/recording/types.rs Outdated

@tamaralipows tamaralipows left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - some questions mostly for my understanding

Comment thread fixtures/integration/README.md Outdated
Comment thread fynd-core/tests/integration/derived_data_tests.rs Outdated
Comment thread fynd-core/tests/integration/solution_tests.rs Outdated
Comment thread tools/record-market/src/golden.rs Outdated

@dianacarvalho1 dianacarvalho1 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tvinagre ! I gave a quick review - there are already a lot of good suggestions (that I agree 👍🏼)!

Comment thread .github/workflows/nightly-canary.yaml
Comment thread fixtures/integration/README.md Outdated
Comment thread .github/workflows/nightly-canary.yaml
@tvinagre tvinagre force-pushed the tl/ENG-5578-integration-tests branch from 5f6eb60 to 29cce24 Compare April 6, 2026 19:10
Comment thread fixtures/integration/README.md Outdated
Comment thread fynd-core/src/solver.rs Outdated
Comment thread fynd-core/src/solver.rs Outdated
Comment thread fynd-rpc/src/builder.rs
Comment thread Cargo.toml Outdated
@tvinagre tvinagre force-pushed the tl/ENG-5578-integration-tests branch from 29cce24 to d84433d Compare April 9, 2026 01:06
tvinagre and others added 16 commits April 27, 2026 17:09
The solver produced different routing results across process runs because
Rust's HashMap/HashSet use random SipHash seeds. Graph construction in
`initialize_graph` and `add_components` iterated these collections to
assign NodeIndex values and insert edges. Since petgraph's
`graph.edges(node)` returns edges in insertion order, different seeds
produced different BFS traversal and SPFA relaxation orders, leading to
different routes being selected.

Three changes:
- Sort tokens before node creation and sort components before edge
  insertion in both `initialize_graph` and `add_components`
- Sort the token list within each component before generating edge pairs
- Sort the SPFA active_nodes set after collecting from HashSet to ensure
  deterministic relaxation order between rounds

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce recording module in fynd-core with serializable wrappers
around Tycho Update messages for integration testing. Uses serde_json
(required by typetag) + zstd for compact storage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Write and read MarketRecording files using serde_json for typetag
compatibility and zstd compression. Roundtrip tested with both
empty recordings and recordings containing ProtocolSim trait objects.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Address code quality review findings:
- Add doc comments to write_recording and read_recording
- Move tempfile to workspace dependencies for consistency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Make handle_tycho_message pub for replay access. Re-export MarketEvent
and DerivedDataEvent for integration test access.
Test harness replays recorded Update messages through TychoFeed,
computes derived data, builds worker pools, and exposes a quote()
method for integration tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Store WorkerPool in TestHarness to prevent thread leaks
- Store ComputationManager JoinHandle for panic visibility
- Use QuoteStatus enum instead of String in GoldenOutput
- Remove premature include_str\! from load_test_scenarios
- Parse WETH address before acquiring read lock

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CLI tool that connects to Tycho, captures market state,
and generates golden outputs for integration testing.
Core recording and golden generation logic is stubbed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tests verify all golden pairs return solutions, unknown tokens
produce errors, quality stays within 1% of baseline, and basic
invariants hold (positive net output, bounded hops, positive gas).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
P95 solve time must stay within 4x golden baseline or 200ms
absolute cap. No individual solve may exceed the absolute cap.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Verifies all derived data fields are computed and coverage meets
thresholds: spot prices >= 95%, pool depths >= 90%, token gas
prices >= 80%.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Run BLESS_GOLDEN=1 cargo nextest run to regenerate golden_outputs.json
from the existing market recording after code changes that intentionally
affect solution quality.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Track *.json.zst files in fixtures/integration/ with Git LFS
to avoid bloating the repository with large market recordings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integration tests run on every PR using recorded market data.
Nightly canary connects to live Tycho to detect API drift and
real-world regressions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace todo!() stubs with working implementations:
- recorder.rs: connect to Tycho, discover protocols, capture Update
  messages from ProtocolStreamBuilder for the configured duration
- golden.rs: replay recording through TychoFeed, compute derived data,
  build WorkerPoolRouter, run all pairs.json scenarios, capture outputs
- scenarios.rs: parse pairs.json token definitions and trading pairs
  with decimal-scaled amounts into TestScenario structs

Also: make register_exchanges pub for recorder access, add
tycho-execution/tokio-stream/num-bigint deps to record-market,
remove dead_code allows from harness and scenarios modules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tested against tycho-beta.propellerheads.xyz mainnet. Fixes:

- Add protocol filter, TVL range, min_token_quality, and
  traded_n_days_ago CLI flags to control recording scope
- Use TLS by default (production Tycho requires it)
- Replace chain_id with chain name string in RecordingMetadata
- Add recording filter params to RecordingMetadata for reproducibility
- Fix TVL filter: with_tvl_range(min, min) not (min, 0.0)
- Fix broadcast channel error: keep a receiver alive during replay
  so handle_tycho_message's event broadcast doesn't fail
- Wait for spot_prices + pool_depths (not token_prices) in replay
  since token_prices requires a live gas_price feed
- Remove unused rpc_url from recorder (not needed for Tycho stream)
- Add .gitattributes comment explaining LFS tracking
- Increase derived data timeout to 120s for larger recordings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tvinagre and others added 21 commits April 27, 2026 17:13
…rtions

Store spot_price_pools, pool_depth_pools, and token_prices counts in
GoldenMetadata.derived_data during golden generation. Integration tests
assert exact equality against these values since replay is deterministic.

Replaces 3 threshold-based coverage tests with a single
test_derived_data_matches_golden that also verifies pool/token counts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Re-expose TychoFeed, TychoFeedConfig, DataFeedError as pub (needed by
  recording tool and integration tests for replay)
- Re-export parse_chain from fynd_rpc for tools that don't depend on
  fynd-core directly (fynd-swap-cli, fynd-benchmark)
- Add tools/record-market to workspace members alongside tools/fynd-swap-cli
- Regenerate Cargo.lock after rebase

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Resolve conflicts in .gitignore, Cargo.toml, lib.rs, feed/mod.rs,
  derived/mod.rs, builder.rs, src/main.rs, fynd-swap-cli
- Add missing doc comments required by main's #[warn(missing_docs)]
- Re-expose TychoFeed/TychoFeedConfig/DataFeedError as pub for replay
- Adapt to main's renames (BlacklistConfig -> BlocklistConfig)
- Adapt to main's worker_pools.toml (bellman_ford_2_hops)
- Regenerate golden fixtures from fresh recording

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add fynd-test-fixtures crate with shared types (ExpectedOutput,
  MarketRecording, TestScenario) — no test code in fynd-core lib
- Add Solver::from_recording() factory method — encapsulates replay
  pipeline, keeps TychoFeed/TychoFeedConfig as pub(crate)
- Delete fynd-core/src/recording/ module entirely
- Remove zstd from fynd-core dependencies
- Rename GoldenOutput -> ExpectedOutput per reviewer feedback
- Move fixtures to fynd-core/tests/fixtures/ (closer to tests)
- Tool reads back recording before generating expected outputs to
  ensure golden matches deserialized data (VM states filtered)
- load_test_scenarios() returns Result (no panics in library code)
- parse_chain error no longer lists specific chain names
- Split READMEs: tools/record-market/ + fynd-core/tests/integration/
- Use local tycho-simulation path dep (pending PR #568 merge)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All 8 integration tests pass consistently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…adme

- Move BlockGasPrice/GasPrice/MarketEvent imports to top-level (not
  feature-gated or test-only)
- Add tracing::warn when block number or gas price fall back to defaults
  in from_recording
- Fix README reference: golden_outputs.json → expected_outputs.json

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sion

Architecture:
- Gate Solver::from_recording and SolverBuildError::Replay behind
  cfg(feature = "test-utils") so test infrastructure stays off the
  public API surface
- Revert 3 unnecessary visibility escalations (MarketEvent re-export,
  handle_tycho_message pub, DerivedDataEvent re-export)
- Remove unused anyhow dep from fynd-core

Canary workflow fixes:
- Use --output-dir (not --output) matching the actual CLI flag
- Write directly to fynd-core/tests/fixtures/ (where tests read)
- Add --features test-utils to nextest invocations
- Add failure notification step

Test improvements:
- Use BigUint arithmetic for quality comparison instead of lossy f64
- Add RUSTFLAGS="-D warnings" to integration test CI job
- Add publish = false to record-market Cargo.toml

Cleanup:
- Remove dead _worker_pools_hash computation in golden.rs
- Remove dead expected_file_path() function
- Log warning on gas price parse failure instead of silent None

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…infra

- Remove stale fixtures/integration/ duplicate (686KB binary in git objects)
- Pin all CI actions to commit SHAs in new integration_tests and canary jobs
- Hoist sha2 to workspace dependencies
- Add schema_version field to RecordingMetadata for format versioning
- Deduplicate PoolsFile into parse_pools_toml() in test-fixtures
- Warn on broadcast failure in from_recording instead of silent drop
- Guard f64-to-u128 cast in scenario loader with finite/non-negative check
- Tighten P95 timing threshold from 4x to 3x

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the duplicate fixtures/integration/ directory (tests read from
fynd-core/tests/fixtures/). Rename golden.rs to expected.rs and replace
all "golden" references with "expected" for consistent naming.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Docker dependency caching layer needs manifests and stub sources
for all workspace members, including the new test-fixtures and
record-market crates, so cargo can resolve the workspace graph.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use effective_gas_price() instead of matching only on Legacy variant.
On mainnet (EIP1559), the recorder was silently falling back to a
hardcoded 10 gwei default, discarding the real gas price.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Re-record market state and expected outputs against current solver.
Widen quality regression threshold from 1% to 20% — the solver is
non-deterministic due to async derived-data computation timing, and
1% was causing consistent flakes (observed up to 16% run-to-run
variance on marginal routing pairs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reverts the temporary 20% quality threshold back to 1%. The solver
non-determinism that required the wider threshold is fixed on the
tl/fix-graph-determinism branch.

Adds an `#[ignore]` integration test (`regenerate_expected_outputs`)
that re-generates expected_outputs.json from the existing recording.
Run after algorithm changes with:
  cargo nextest run -p fynd-core --test integration \
    regenerate_expected --all-features --run-ignored ignored-only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The recording format serializes tycho-simulation Update messages
directly, which requires the serde support released in 0.256.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…on-tests

# Conflicts:
#	.gitignore
#	Cargo.lock
#	Cargo.toml
#	Dockerfile
#	fynd-core/src/algorithm/bellman_ford.rs
#	fynd-core/src/feed/events.rs
#	fynd-core/src/feed/mod.rs
#	fynd-core/src/graph/petgraph.rs
#	fynd-core/src/solver.rs
Replaying the existing recording through current code moves two
LINK-pair baselines (UNI->LINK +16.4%, LINK->AAVE -16.0%); the other
42 scenarios are within 0.5% and no statuses changed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds the new tool, fixtures crate, and integration suite to the
codebase guides, per review feedback.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@tvinagre tvinagre force-pushed the tl/ENG-5578-integration-tests branch from d84433d to 831b5e2 Compare June 9, 2026 23:28
tvinagre and others added 2 commits June 10, 2026 22:33
The bless test hardcoded block_number 0 while the record-market CLI
read it from the recording; both now derive it from the replayed
market state, the same source Solver::from_recording uses. Drop the
now-unused MarketRecording::last_block_number. Re-bless the fixture
(only metadata and per-run solve times change; amounts identical).

The nightly canary regenerates baselines from its own fresh recording,
so it detects format changes and pipeline crashes, not quality
regressions; the workflow comment and failure messages now say so.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@louise-poole louise-poole left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only saw 2 smallish things - otherwise looks good to me!

Comment on lines +30 to +33
solver
.wait_until_ready(Duration::from_secs(120))
.await
.expect("solver not ready after 120s");

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of groups multiple tests that call this and with a potential 120s wait each... we risk making our tests excessively slow. Could we maybe reuse the same test harness in all tests/one per test group? Initialise it only once?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good instinct, but the 120s here is a poll deadline, not a wait — wait_until_ready returns as soon as derived data is ready, which is ~15-20s in practice (you can see it in the per-test durations). Reuse isn't really available to us either: nextest runs each test in its own process, so a shared/static harness can never be hit twice, and the Solver's background tasks bind to each test's tokio runtime so sharing would be unsound even under plain cargo test. The one thing that would cut builds is collapsing the tests into fewer #[tokio::test] fns, which we'd rather not do (loses per-test granularity).

The saving grace is that the builds run in parallel: on the last CI run the whole Integration Tests job took 128s vs 216s for Compile & Test, so the suite isn't on the critical path. Happy to revisit (e.g. one harness per logical group via a shared binary) if it grows enough to become the bottleneck.

Comment thread tools/record-market/src/recorder.rs Outdated
cargo nextest run -p fynd-core --test integration
```

## Test Descriptions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect - thank you! Nothing else to say here other than Louise's comments

tvinagre and others added 3 commits June 12, 2026 23:04
record-market gains a --chain flag (default ethereum). The recording
stores the chain in its metadata and both replay paths (expected-output
generation and the test harness) parse it from there, so fixtures are
self-describing and replay can never disagree with the recording.

Trading-pair scenarios move from a compile-time include of the
benchmark's pairs.json to per-chain runtime fixtures at
fynd-core/tests/fixtures/pairs/<chain>.json, seeded with the existing
Ethereum set. Recording a new chain requires adding its pairs file.

Also corrects the integration README run command, which was missing
--features test-utils and silently ran zero tests when copy-pasted.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The main merge auto-resolved Cargo.lock but left the branch-only crates
(record-market, fynd-test-fixtures) at 0.78.0 while their manifests
inherit the bumped 0.78.1 workspace version, breaking --locked builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tvinagre tvinagre enabled auto-merge June 13, 2026 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants