-
Notifications
You must be signed in to change notification settings - Fork 35
Kubernetes Probes #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Kubernetes Probes #142
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
Can you add docs on how this works with op-conductor? from the description it seems like each sequencer in the HA setup has 1 builder. how would op-conductor behave if every builder was down and not healthy? |
Yeah this seems to be the typical setup. A 1 to many builder-sequencer relationship is undefined behaviour according to OP. So you need 1:1 sequencer builder.
Currently op conductor doesn't have any logic for this. Once we have the appropriate probes here we can try to get op-conductor support.
If all 3 are not healthy then we're in a pretty bad state. I suppose a random instance would be chosen to be the primary sequencer in that case. |
Co-authored-by: shana <[email protected]>
@avalonche @0xOsiris I've added some docs. I'm second guessing my use of the
What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a different design for rollup boost integration into sequencer HA, which doesn't require this change.
The full TDD is here, feel free to check it out.
I have a longer response in the op conductor discord channel (tagged you already). The above TDD is only for rollup boost integration. It's not for builder HA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, only thing is that I would add in the docs /healthz
returns 206 and 503 after the builder / l2 fails to produce a block only once and that endpoint will still return 200 if the builder is up but the local l2 is not
FYI the op-conductor change ethereum-optimism/optimism#15316 depends on this PR now. |
This should probably be debounced locally if this is expected to happen frequently. |
Have been integration testing this with our conductor rollup-boost monitoring PR It seems the /healthz response is sticky which I dont think will work great for conductor, for ex:
Similarly, you can kill r-builder on a non-active sequencer and conductor will report is as healthy. Is there any way we can move the health probe to background so we get async rbuilder health updates? |
Yeah this makes sense - this is because the health probe is only updated during a |
…for non-sequencing el's
Osiris/background health check
@zhwrd @teddyknox Thanks for the callout on the sticky health status. I've added an additional background health check to the rollup-boost server that continuously monitors unsafe head progression of the builder which should functionally work the same as the health check op-conductor is performing to ensure the unsafe head is progressing within This should resolve the issue of the health status not being updated on non-sequencing EL's. In the sequencing case we have now have 2 health checks running in parallel.
|
This PR is dependent on #141
You can view the diff from that PR here
Health, readiness, liveness checks layer added.
rollup-boost
.