Skip to content

Kubernetes Probes #142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Apr 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
8e7a51b
wip
0xForerunner Mar 14, 2025
39ef56e
wip
0xForerunner Mar 14, 2025
e49f054
wip
0xForerunner Mar 15, 2025
f77ff56
wip
0xForerunner Mar 18, 2025
6249337
clean things up
0xForerunner Mar 18, 2025
0641965
fix for cloned service
0xForerunner Mar 18, 2025
79bab4a
cleanup process_response
0xForerunner Mar 18, 2025
8ceabac
eyre bail
0xForerunner Mar 18, 2025
bf55d6e
remove unnecessary deps
0xForerunner Mar 18, 2025
b9b242e
Add kubernetes probe layer
0xForerunner Mar 19, 2025
121bb2b
implement health/ready check logic
0xForerunner Mar 19, 2025
2e1ce26
modify ready logic
0xForerunner Mar 19, 2025
04678f7
fix comment/feature
0xForerunner Mar 19, 2025
7a3e0a0
delete old file
0xForerunner Mar 19, 2025
0a1d9eb
working
0xForerunner Mar 21, 2025
adbf5f3
Update src/client/http.rs
0xForerunner Mar 25, 2025
636f7e9
Merge branch 'main' into forerunner/proxy
0xForerunner Mar 25, 2025
f5e8f69
parse response cod
0xForerunner Mar 25, 2025
7497301
clippy fix
0xForerunner Mar 25, 2025
fd321ac
Merge branch 'main' into forerunner/probes
0xForerunner Mar 26, 2025
de15894
Merge branch 'forerunner/proxy' into forerunner/probes
0xForerunner Mar 26, 2025
de3d03e
Merge branch 'main' into forerunner/probes
0xForerunner Mar 28, 2025
ddd6d9b
Probe docs
0xForerunner Mar 28, 2025
9b60991
Switch to returning health status only from /healthz using http statu…
0xForerunner Mar 28, 2025
74d723e
Update docs to describe health status codes
0xForerunner Mar 28, 2025
2732c38
remove stray comments
0xForerunner Mar 28, 2025
13a600b
cleanup, add tests
0xForerunner Mar 28, 2025
98adf3a
remove stray comment
0xForerunner Mar 28, 2025
fdbc717
merge main
0xOsiris Apr 8, 2025
8ba95f2
chore: rm mocks
0xOsiris Apr 8, 2025
2a5c613
chore: fmt
0xOsiris Apr 8, 2025
3b06bd9
fix: default to healthy status
0xOsiris Apr 22, 2025
fbfdb20
merge main
0xOsiris Apr 22, 2025
6d8b40a
chore: fix dockerignore
0xOsiris Apr 23, 2025
a6a94ef
feat: add background process to query block height as a health check …
0xOsiris Apr 24, 2025
bb07cdf
fix: signatures
0xOsiris Apr 24, 2025
c6855a4
chore: update comments
0xOsiris Apr 24, 2025
eae3273
chore: clippy
0xOsiris Apr 24, 2025
200b407
test: add tests
0xOsiris Apr 24, 2025
c4d0c76
fix: stress tests
0xOsiris Apr 24, 2025
f15e7c4
fix: change health check to check unsafe head progression on builder
0xOsiris Apr 25, 2025
1cd09aa
chore: update doc comments
0xOsiris Apr 25, 2025
0db4e36
fix: loop
0xOsiris Apr 25, 2025
7cc6c08
chore: update comments
0xOsiris Apr 26, 2025
7d5a863
Merge pull request #13 from flashbots/osiris/background-health-check
0xOsiris Apr 26, 2025
505ad2a
merge main
0xOsiris Apr 26, 2025
6d4afc0
Merge branch 'main' into forerunner/probes
0xOsiris Apr 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 169 additions & 4 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 7 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ edition = "2024"
[dependencies]
op-alloy-rpc-types-engine = "0.12.0"
alloy-rpc-types-engine = "0.13.0"
alloy-rpc-types-eth = "0.13.0"
alloy-primitives = { version = "0.8.10", features = ["rand"] }
tokio = { version = "1", features = ["full"] }
tracing = "0.1.4"
Expand All @@ -18,7 +19,10 @@ moka = { version = "0.12.10", features = ["sync"] }
http = "1.1.0"
dotenv = "0.15.0"
tower = "0.4.13"
tower-http = { version = "0.5.2", features = ["decompression-full"] }
tower-http = { version = "0.5.2", features = [
"decompression-full",
"sensitive-headers",
] }
http-body-util = "0.1.2"
hyper = { version = "1.4.1", features = ["full"] }
hyper-util = { version = "0.1", features = ["full"] }
Expand Down Expand Up @@ -48,7 +52,7 @@ rand = "0.9.0"
time = { version = "0.3.36", features = ["macros", "formatting", "parsing"] }
op-alloy-consensus = "0.12.0"
alloy-eips = { version = "0.13.0", features = ["serde"] }
alloy-rpc-types-eth = "0.13.0"
alloy-consensus = {version = "0.13.0", features = ["serde"] }
anyhow = "1.0"
testcontainers = { version = "0.23.3" }
assert_cmd = "2.0.10"
Expand All @@ -57,6 +61,7 @@ tokio-util = { version = "0.7.13" }
bytes = "1.2"
reth-rpc-layer = { git = "https://github.com/paradigmxyz/reth.git", rev = "v1.3.7" }
ctor = "0.4.1"
reqwest = "0.12.15"

[[bin]]
name = "rollup-boost"
Expand Down
14 changes: 14 additions & 0 deletions docs/running-rollup-boost.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,20 @@ While this does not ensure high availability for the builder, the chain will hav

![rollup-boost-op-conductor](../assets/rollup-boost-op-conductor.png)

### Health Checks

`rollup-boost` supports the standard array of kubernetes probes:

- `/healthz` Returns various status codes to communicate `rollup-boost` health
- 200 OK - The builder is producing blocks
- 206 Partial Content - The l2 is producing blocks, but the builder is not
- 503 Service Unavailable - Neither the l2 or the builder is producing blocks
`op-conductor` should eventually be able to use this signal to switch to a different sequencer in an HA sequencer setup. In a future upgrade to `op-conductor`, A sequencer leader with a healthy (200 OK) EL (`rollup-boost` in our case) could be selected preferentially over one with an unhealthy (206 or 503) EL. If no ELs are healthy, then we can fallback to an EL which is responding with `206 Partial Content`.

- `/readyz` Used by kubernetes to determine if the service is ready to accept traffic. Should always respond with `200 OK`

- `/livez` determines wether or not `rollup-boost` is live (running and not deadlocked) and responding to requests. If `rollup-boost` fails to respond, kubernetes can use this as a signal to restart the pod. Should always respond with `200 OK`

## Observability

To check if the rollup-boost server is running, you can check the health endpoint:
Expand Down
1 change: 0 additions & 1 deletion scripts/ci/kurtosis-params.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ optimism_package:
fjord_time_offset: 0
granite_time_offset: 0
isthmus_time_offset: 5
fund_dev_accounts: true
mev_params:
rollup_boost_image: "flashbots/rollup-boost:develop"
additional_services:
Expand Down
4 changes: 2 additions & 2 deletions scripts/ci/stress.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ run() {
# the transactions will be included in the canonical blocks and finalized.

# Figure out first the builder's JSON-RPC URL
ROLLUP_BOOST_SOCKET=$(kurtosis port print op-rollup-boost op-rollup-boost-1-op-kurtosis rpc)
OP_RETH_BUILDER_SOCKET=$(kurtosis port print op-rollup-boost op-el-builder-1-op-reth-op-node-op-kurtosis rpc)
ROLLUP_BOOST_SOCKET=$(kurtosis port print op-rollup-boost op-rollup-boost-2151908-1-op-kurtosis rpc)
OP_RETH_BUILDER_SOCKET=$(kurtosis port print op-rollup-boost op-el-builder-2151908-1-op-reth-op-node-op-kurtosis rpc)

# Private key with prefunded balance
PREFUNDED_PRIV_KEY=0x59c6995e998f97a5a0044966f0945389dc9e86dae88c7a8412f4603b6b78690d
Expand Down
Loading
Loading