refactor!: unify TLS stack on rustls + aws-lc-rs#963
Conversation
Three call sites also need their feature lists adjusted: - tycho-simulation: add `eip712` to alloy's feature set. alloy 1.5+ gates `DynSolValue::CustomStruct` on the `eip712` feature, and workspace feature unification under `cargo --all-features` activates that variant transitively. Without forwarding the feature here the `format_dyn_sol_value` match in `evm/traces/decoder.rs` becomes non-exhaustive whenever any other workspace crate enables eip712. - tycho-execution: add `sol-types` to alloy's feature set. The crate imports `alloy::core::sol` and `alloy::sol_types`, but the previous feature list (`providers`, `rpc-types-eth`, `eip712`, `signer-local`, `node-bindings`) didn't pull `sol-types` in standalone builds. The workspace already activates `sol-types` via tycho-ethereum so the lib compiled under `--workspace`, but `cargo check -p tycho-execution --no-default-features --features evm` failed with 56 unresolved imports. Surfaced by the new TLS-feature CI matrix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five separate `reqwest` declarations across the workspace (`0.12`, `0.12.7`, `0.12.22`, `0.12`, `0.13.2`) collapse to a single workspace-pinned `reqwest = "0.13"` with `default-features = false` and the feature set that recovers reqwest 0.12's previous default-on behaviour (`json`, `stream`, `charset`, `http2`, `system-proxy`, `query`, `native-tls`). Per-crate decls switch to `workspace = true` plus their own additions (`zstd` for tycho-client, `blocking` for tycho-execution). `foundry-block-explorers` bumps 0.22 -> 0.23 so its transitive reqwest pin no longer holds reqwest 0.12 in the lockfile alongside the new 0.13. The `native-tls` workspace feature is the temporary bridge — it keeps the existing native-tls TLS path active until the dedicated TLS unification commit later in the series swaps it for `rustls-no-provider` and adds the per-crate `tls-aws-lc-rs` / `tls-ring` / `tls-native` feature scaffold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…http 1.x
Bundled because the two are coupled: tokio-tungstenite 0.29 reuses
http 1.x types from `Request::builder()` so tycho-client's previous
`hyper = "0.14"` (http 0.2) imports stop matching at the
`Request::builder().header(...)` call sites. Splitting into two
commits leaves an intermediate state that does not compile.
- Workspace: `tokio-tungstenite = "0.29"` with
`default-features = false, features = ["connect", "native-tls"]`. The
`native-tls` flag is the temporary bridge until the dedicated TLS
unification commit later in the series swaps it for
`rustls-tls-native-roots`.
- tycho-client/Cargo.toml: drop the `hyper = "0.14.27"` dep; replace
with `http = "1"` so the header constants and `Uri` re-export pull
from the http 1.x crate that reqwest 0.13 + tokio-tungstenite 0.29
already require transitively.
- tycho-client/src/deltas.rs: swap `use hyper::{header, Uri}` to
`use http::{header, Uri}`; adapt the WebSocket message handling to
tokio-tungstenite 0.29 — `Message::Text(String)` carries `Utf8Bytes`
now, `Message::{Binary,Pong}` carry `Bytes`. Convert at the call
sites (`.into()` for serialised commands, `.as_ref()` for the
zstd-decompress path that previously called `as_slice()` on a
`Vec<u8>`).
- tycho-indexer/src/services/ws.rs: same `Utf8Bytes` / `Bytes`
adaptation across roughly twenty actix-web/actix-tungstenite
message-construction sites.
- tycho-simulation/Cargo.toml: drop the explicit `tokio-tungstenite =
"0.28.0"` pin in favour of `workspace = true`; the workspace now
carries the version everyone shares.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The three deps are version-coupled — `opentelemetry-otlp 0.30` requires
tonic 0.13 internally; tonic 0.13 requires prost 0.13; bumping any
one in isolation fails to resolve in cargo. Land them together.
Workspace:
- `console-subscriber 0.2 -> 0.5` (transitive bump pulled in by the
tonic version bump elsewhere in the dep graph; lands here so the
intermediate state still resolves).
tycho-indexer/Cargo.toml:
- `prost 0.11 -> 0.13`, `prost-types 0.11 -> 0.13`. Add a renamed
workspace alias `prost_11 = { package = "prost", version = "0.11" }`
used at the two sites that decode `tycho_substreams::BlockChanges`
(which still implements prost 0.11's `Message` upstream).
- `tonic 0.9 -> 0.13`, `default-features = false`, features
`[transport, router, codegen, prost, gzip, tls-native-roots,
tls-aws-lc]`. `transport` already pulls `channel` and `server` per
tonic's own feature graph; only `router` (axum-based
`Server::add_service`) needs to be listed separately.
- `opentelemetry 0.22 -> 0.30`, `opentelemetry-otlp 0.15 -> 0.30`,
`opentelemetry_sdk 0.22 -> 0.30` (drop `rt-tokio` — the 0.30
`BatchSpanProcessor::builder(...)` we use runs on a dedicated
`std::thread`, so the tokio-runtime feature is unused),
`tracing-opentelemetry 0.23 -> 0.31`,
`actix-web-opentelemetry 0.17 -> 0.22`.
- `tycho-substreams 0.6 -> 0.8` (lockfile-only churn; types still on
prost 0.11 internally, hence the alias above).
Source-level adaptations:
- `pb/*.tonic.rs`: replace `tonic::body::BoxBody` (private in 0.13)
with the public `tonic::body::Body` everywhere the generated stubs
use it. Hand-patched only; the source `.proto` files remain
authoritative.
- `substreams/mock.rs`: rewrite the in-process gRPC mock against tonic
0.13's `tower::Service` signature
(`http::Request<tonic::body::Body>`) and consume the request body
via `http_body_util::BodyExt::collect` instead of the removed
`http_body::Body::poll_data`. Hand the bound listener directly to
`Server::serve_with_incoming(TcpListenerStream::new(...))` to close
the previous bind-drop-rebind race. Adds dev-deps on `tower 0.5`,
`http-body-util 0.1`, `bytes 1`, and the `net` feature on the
workspace `tokio-stream`.
- `substreams/mod.rs`: tonic 0.13's `ClientTlsConfig::new()` no longer
bundles trust roots; call `.with_native_roots()` to keep the current
platform-trust behaviour.
- `ot.rs`: full rewrite for opentelemetry 0.30. The 0.21 `new_pipeline`
/ `install_batch` / `set_error_handler` API is gone; build the
exporter via `SpanExporter::builder().with_tonic()`, attach a
`BatchSpanProcessor`, and register an `SdkTracerProvider`.
`Resource::builder().with_service_name("tycho-indexer").build()` for
a stable Tempo / Jaeger label. `init_tracing` returns a
`TracingHandle` that owns the provider; each `run_*` entry point
binds it to a function-scope `_tracing_handle` and the `Drop` impl
invokes `provider.shutdown()` to flush the buffered batch on exit.
Diagnostics on shutdown go via `eprintln!` rather than
`tracing::warn!` because the OTel layer is the very subsystem being
torn down. `RUST_LOG_OTLP` malformed values now fall back to
`RUST_LOG` instead of panicking.
- `extractor/protocol_extractor.rs` + `testing.rs`: switch the
`prost::Message` import to `prost_11::Message` for the two
`tycho_substreams::BlockChanges` decode sites; explicit `.map_err`
for prost 0.11's `DecodeError` since `ExtractionError`'s `From`
impl only covers prost 0.13.
- `main.rs` (`run_indexer`): wrap `create_tracing_subscriber()` in
`let _enter = main_runtime.enter()` so the OTLP grpc-tonic exporter
builds inside a tokio runtime context and the `BatchSpanProcessor`'s
dedicated thread can drive the captured handle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`aws-sdk-s3`'s default features include `rustls`, which forwards to `aws-smithy-runtime/tls-rustls` -> `aws-smithy-http-client/legacy-rustls-ring`. That feature compiles a second TLS stack (rustls 0.21 + ring + hyper 0.14 + tokio-rustls 0.24 + hyper-rustls 0.24) into every binary that pulls `aws-sdk-s3`, alongside the modern rustls 0.23 + aws-lc-rs path that `aws-config` activates by default via `default-https-client`. Disable `aws-sdk-s3` default features and re-enable only the ones we actually need (`sigv4a`, `default-https-client`, `rt-tokio`). Result on the active build graph: rustls 2 -> 1 (only 0.23.x) hyper 2 -> 1 (only 1.x) hyper-rustls 2 -> 1 (only 0.27.x) tokio-rustls 2 -> 1 (only 0.26.x) rustls-webpki 2 -> 1 rustls-pemfile 1 -> 0 `ring` stays in the graph but only via `aws-config` and `aws-sigv4`, where it is the signing-time constant-time crypto for AWS SigV4 — it is no longer pulled by any TLS path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…der in binaries Up to this point each library implicitly inherited the workspace's `reqwest` / `tokio-tungstenite` TLS choice. This commit gives every TLS-touching crate (`tycho-client`, `tycho-simulation`, `tycho-ethereum`, `tycho-execution`, `tycho-integration-test`, `tycho-indexer`) three explicit TLS feature flags: default = ["tls-aws-lc-rs"] # plug-and-play tls-aws-lc-rs = forward to reqwest/rustls + tonic/tls-aws-lc + rustls/aws-lc-rs tls-ring = forward to reqwest/rustls-no-provider + tonic/tls-ring + rustls/ring tls-native = forward to reqwest/native-tls + tokio-tungstenite/native-tls `tycho-indexer` is binary-only and rustls-only — tonic 0.13 has no native-tls path and the indexer always reaches into rustls (substreams gRPC + AWS SDK + alloy RPC), so no `tls-native` flag. Workspace: - `reqwest` features swap `native-tls` -> `rustls-no-provider`. The no-provider variant pulls rustls + `rustls-platform-verifier` without a crypto backend, leaving the choice to the binary. - `tokio-tungstenite` features swap `native-tls` -> `rustls-tls-native-roots` (uses `rustls-native-certs`, the same trust source tonic + AWS SDK use). - New `rustls = "0.23"` workspace dep (default-features = false + std/logging/tls12) so per-crate features can wire `aws-lc-rs` / `ring` consistently. Per-crate Cargo.toml: - Each lib forwards its `tls-*` feature to the relevant reqwest/tokio-tungstenite sub-feature so consumers can `default-features = false, features = ["tls-ring"]` (or `tls-native`). - `tycho-indexer` adds `[features]` block that maps `tls-aws-lc-rs` to `tonic/tls-aws-lc` + `rustls/aws-lc-rs` (and `tls-ring` to `tonic/tls-ring` + `rustls/ring`); the previous hard-coded `tonic/tls-aws-lc` feature line is dropped. - `tycho-ethereum` switches alloy's TLS sub-feature from `reqwest-default-tls` to `reqwest-rustls-tls`. - `tycho-indexer/Cargo.toml` cleanups: `tokio-stream` regular dep inherits via `workspace = true` (was redundantly version-pinned). Binary entry points (tycho-indexer, tycho-client, tycho-integration-test, tycho-client-py) install the rustls crypto provider before any TLS handshake. The function is cfg-gated three ways: `tls-aws-lc-rs` -> aws-lc-rs, `tls-ring` -> ring, `tls-native` (or no TLS feature) -> no-op for tycho-client (native-tls path doesn't touch rustls); for tycho-indexer / tycho-integration-test the no-feature arm is a `compile_error!` because both binaries always exercise the rustls path. Idempotent: `CryptoProvider::get_default().is_none()` guard covers the tycho-client-py case where the Python host may register a provider first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the workspace converged on rustls 0.23 + aws-lc-rs as the default TLS path, every TLS-touching crate also exposes `tls-ring` and `tls-native` so consumers can swap backends. Add a `check-tls-features` job that exercises all 16 (crate, tls-feature) combinations on every PR that touches Rust paths, so a future change can't silently break a non-default backend. `check-no-default-features` excludes `tycho-indexer` and `tycho-integration-test` because both binaries emit `compile_error!` when neither rustls provider feature is selected — running `cargo check --workspace --no-default-features` against them is the intended ergonomics, not a CI gate. Mirrors the existing skip pair for branch-protection compatibility. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
…itrary_precision)
CI's `Unit tests (no external deps)` job (under `--workspace
--all-features`) failed on this PR with:
test dto::test::test_parse_websocket_message ... FAILED
parsing failed: Error("data did not match any variant of untagged
enum WebSocketMessage", line: 0, column: 0)
`foundry-block-explorers 0.23.0` declares
`serde_json = { features = ["arbitrary_precision"] }` as a hard
dependency (no opt-out). Cargo unifies serde_json features
workspace-wide, so once any crate pulls 0.23 transitively, the
`arbitrary_precision` feature flag activates everywhere — and that
flag is documented as breaking serde's untagged-enum deserialization
(see the comment block already in `tycho-client/src/deltas.rs:510`
that was put there for exactly this reason). The
`tycho_common::dto::WebSocketMessage` round-trip test in
tycho-common's unit suite fails as a result.
Roll back to 0.22.0. The trade-off is that the lockfile keeps an
inactive `reqwest 0.12` entry pulled by foundry-block-explorers 0.22's
transitive deps, but it is not in the active build graph
(`cargo tree -e normal -i reqwest` returns only 0.13.x), so the
practical impact is zero — just one extra entry in `Cargo.lock`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…CI exclude
Previous pattern coupled two pieces of friction:
1. Each binary's `main.rs` had a `compile_error!` arm that fired when
neither `tls-aws-lc-rs` nor `tls-ring` was enabled. Without it,
the workspace-wide `cargo check --workspace --no-default-features`
would build a binary that panics on its first TLS handshake.
2. To accommodate the compile_error, ci-rust.yaml's
`check-no-default-features` job had to exclude
`--exclude tycho-indexer --exclude tycho-integration-test`.
Both go away when the [[bin]] entries declare
`required-features = ["tls-aws-lc-rs"]`. Cargo skips the bin under
`--no-default-features` instead of compile-failing, so the
workspace-wide check stays clean and the workflow does not need to
maintain an exclude list. Default builds and explicit
`--features tls-ring` builds are unaffected (the requirement is
satisfied either way).
- `crates/tycho-indexer/Cargo.toml`: add `required-features` to the
existing `[[bin]]` entry.
- `crates/tycho-integration-test/Cargo.toml`: add an explicit
`[[bin]]` entry (cargo previously inferred it from `src/main.rs`)
with the same `required-features`.
- `crates/tycho-indexer/src/main.rs` /
`crates/tycho-integration-test/src/main.rs`: drop the
`compile_error!` arms.
- `.github/workflows/ci-rust.yaml`: restore
`cargo check --workspace --no-default-features --locked` to its
single-line form.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Hmm... these are autogenerated files. Were they updated manually or pulled from StreamingFast?
There was a problem hiding this comment.
They were vibecoded, I dunno
| use mockall::automock; | ||
| use prost::Message; | ||
| // tycho-substreams 0.8 still implements prost 0.11's `Message` trait; | ||
| // `prost_11` is the workspace alias for that exact version. |
| "extractor" => self.name.clone(), | ||
| ) | ||
| .set(dci.cache_size() as f64); | ||
| dci.emit_cache_metrics(&self.chain.to_string(), &self.name); |
There was a problem hiding this comment.
Why remove the emit metrics line?
There was a problem hiding this comment.
I assume it was because the cache size set is removed.
| // Bind to find an available port, then release so tonic can rebind. | ||
| let listener = std::net::TcpListener::bind("127.0.0.1:0").unwrap(); | ||
| let addr = listener.local_addr().unwrap(); | ||
| drop(listener); |
There was a problem hiding this comment.
You sure this drop isn't necessary anymore?
|
|
||
| // Give the server a moment to bind | ||
| tokio::time::sleep(std::time::Duration::from_millis(50)).await; | ||
|
|
| // OpenTelemetry 0.30 has no equivalent of the 0.21 `global::set_error_handler` | ||
| // callback: BatchSpanProcessor.ExportError / .Emit.ProcessorShutdown and friends | ||
| // emit straight through `tracing::error!`. The internal-log macros set | ||
| // `target = env!("CARGO_PKG_NAME")`, which Cargo expands to the crate's | ||
| // `package.name` with hyphens converted to underscores — i.e. |
There was a problem hiding this comment.
I don't think this version migration info is necessary to stay permanently in the code. Could just be a commit comment imo
There was a problem hiding this comment.
Yeah I left it very verbose for the review, we can clean it up later
| # Drop default `rustls` feature (= aws-smithy-runtime/tls-rustls = legacy | ||
| # rustls 0.21 + ring + hyper 0.14). Keep sigv4a / default-https-client / | ||
| # rt-tokio so the SDK still ships an HTTPS client (rustls 0.23 + aws-lc-rs | ||
| # via the default-https-client feature path). |
There was a problem hiding this comment.
Also here - are all these comments necessary? At least we should remove the parts alluding to the change itself (like 'drop' and 'keep') and only have comments speaking to anything non-obvious about the parts that remain.
There was a problem hiding this comment.
They are to give context for the PR and for me. They can ofc be removed afterwards.
| if rustls::crypto::CryptoProvider::get_default().is_none() { | ||
| rustls::crypto::aws_lc_rs::default_provider() | ||
| .install_default() | ||
| .expect("install aws-lc-rs default crypto provider"); | ||
| } | ||
|
|
There was a problem hiding this comment.
Is there a reason the python client defaults to this option only?
| #[cfg(feature = "tls-aws-lc-rs")] | ||
| fn install_default_crypto_provider() { | ||
| if rustls::crypto::CryptoProvider::get_default().is_none() { | ||
| rustls::crypto::aws_lc_rs::default_provider() | ||
| .install_default() | ||
| .expect("install aws-lc-rs default crypto provider"); | ||
| } | ||
| } | ||
|
|
||
| #[cfg(all(feature = "tls-ring", not(feature = "tls-aws-lc-rs")))] | ||
| fn install_default_crypto_provider() { | ||
| if rustls::crypto::CryptoProvider::get_default().is_none() { | ||
| rustls::crypto::ring::default_provider() | ||
| .install_default() | ||
| .expect("install ring default crypto provider"); | ||
| } | ||
| } | ||
|
|
||
| #[cfg(not(any(feature = "tls-aws-lc-rs", feature = "tls-ring")))] | ||
| fn install_default_crypto_provider() {} |
There was a problem hiding this comment.
Looks like priority is given to certain feature flags... could this be documented somewhere?
There was a problem hiding this comment.
What do you mean priority?
Feature flags is the standard way to switch between aws_lc_rs and ring
| fn install_default_crypto_provider() { | ||
| if rustls::crypto::CryptoProvider::get_default().is_none() { | ||
| rustls::crypto::aws_lc_rs::default_provider() | ||
| .install_default() | ||
| .expect("install aws-lc-rs default crypto provider"); | ||
| } | ||
| } |
There was a problem hiding this comment.
I wonder if there is a neat place to centralise this. The same fn is duplicated in multiple places...
There was a problem hiding this comment.
This must be redeclared for every binary. It is set in each main.rs individually. Creating dependencies between those is not recommended, code duplication is not large
Summary
Converges the tycho monorepo on a single TLS stack — rustls 0.23 + aws-lc-rs (the rustls 0.23+ default, marked replacement for the unmaintained-since-Feb-2025 ring crate) — and a single
reqwest 0.13major. Replaces the mixed stack we have today (native-tls/OpenSSL for HTTP/WS, rustls 0.21 + ring for tonic 0.9 gRPC, rustls 0.23 + aws-lc-rs sitting unused via the AWS SDK) plus five separatereqwestdeclarations across the workspace.Why
libssl-dev, no system-OpenSSL CVE surface.tls-aws-lc-rsdefault,tls-ring,tls-native) without forking. Mirrors what reqwest / hyper-rustls / tonic 0.13 do upstream.External impact (why the !)
tls-aws-lc-rs(default) /tls-ring/tls-nativefeatures. Consumers usingdefault-features = falsemust opt into one explicitly.reqwest 0.12 → 0.13andalloy 1.1 → 1.8are visible in downstream lockfiles.rustls-platform-verifier(reqwest path) andrustls-native-certs(tokio-tungstenite + tonic). Functionally equivalent for public CA endpoints; relevant only to consumers with custom keystore policies.Per-commit story (bottom-to-top)
chore(deps): bump alloy 1.1 -> 1.8— also adds alloyeip712to tycho-simulation (DynSolValue::CustomStruct gating) andsol-typesto tycho-execution (pre-existing missing feature).chore(deps): bump reqwest 0.12 -> 0.13 + foundry-block-explorers 0.23— collapses 5 reqwest decls to one workspace pin. Keeps native-tls feature as bridge.refactor(client): tokio-tungstenite 0.20 -> 0.29 + drop hyper, adopt http 1.x— coupled because tungstenite 0.29 reuses http 1.x types fromRequest::builder(). tycho-client switcheshyper = 0.14tohttp = 1. Utf8Bytes/Bytes adaptations in deltas.rs + ws.rs.refactor(indexer): adapt to tonic 0.13 + opentelemetry 0.30 + prost 0.13— coupled triad (otlp 0.30 needs tonic 0.13 needs prost 0.13). Hand-patchedpb/*.tonic.rs, mock.rs rewrite, ot.rs rewrite (TracingHandlefor shutdown flush,service.name), prost_11 alias for tycho-substreams 0.8 types.chore(deps): drop AWS SDK legacy rustls/ring path—aws-sdk-s3 default-features = false. Eliminates rustls 0.21 + hyper 0.14 + tokio-rustls 0.24 + hyper-rustls 0.24 + rustls-pemfile from the active graph.feat(tls): per-crate TLS backend feature flags + install crypto provider in binaries— lib feature scaffold + bin install + compile_error guards on indexer/integration-test (no usable build without rustls provider).ci: build matrix across all TLS feature combinations— 16 (crate × feature) combos in ci-rust.yaml; excludes binary crates fromcheck-no-default-features.Open items / accepted residuals
prost_11lets us coexist. Once streamingfastsubstreams 0.7.6(already on prost 0.13) propagates through tycho-protocol-sdk into a newtycho-substreamsrelease, we drop the alias.run_indexervialet _enter = main_runtime.enter()because that path is sync (not#[tokio::main]); other entry points are#[tokio::main]so it's free.Test plan
cargo +nightly fmt --all --checkcargo +nightly clippy --workspace --all-features --all-targets -- -D warningscargo doc --workspace --no-deps --all-features --lockedcargo check --workspace --no-default-features(with binary excludes)cargo nextest rununit tests: 146/147 (1 pre-existing parallel-isolation flake on origin/main)🤖 Generated with Claude Code