helm, info, status: Check and report reachability for scan and sequencer#6073
helm, info, status: Check and report reachability for scan and sequencer#6073giner wants to merge 4 commits into
Conversation
b376a39 to
b4302e1
Compare
96d24ab to
69a89d9
Compare
Possible status values: - Scan: 0 (reachable and not lagging), 1 (lagging), 2 (unreachable). - Sequencer: 0 (reachable and not lagging), 1 (lagging), 2 (unreachable), 3 (unreachable and lagging). Signed-off-by: Stanislav German-Evtushenko <ginermail@gmail.com>
Signed-off-by: Stanislav German-Evtushenko <ginermail@gmail.com>
69a89d9 to
01a3ced
Compare
|
/cluster_test |
|
Deploy cluster test triggered for Commit 01a3ced14e736c8d92fc1b7b9fb3d072cb36585b in , please contact a Contributor to approve it in CircleCI: https://app.circleci.com/pipelines/github/DACH-NY/canton-network-internal/71127 |
|
/cluster_test |
|
Deploy cluster test triggered for Commit 01a3ced14e736c8d92fc1b7b9fb3d072cb36585b in , please contact a Contributor to approve it in CircleCI: https://app.circleci.com/pipelines/github/DACH-NY/canton-network-internal/null |
|
/cluster_test |
|
Deploy cluster test triggered for Commit 01a3ced14e736c8d92fc1b7b9fb3d072cb36585b in , please contact a Contributor to approve it in CircleCI: https://app.circleci.com/pipelines/github/DACH-NY/canton-network-internal/71265 |
|
martinflorian-da
left a comment
There was a problem hiding this comment.
I agree with Itai that this is hard to review. The downside of getting something wrong here is limited though IMO, so happy to rush this a bit to make the release cut tomorrow, and we can always fix things later on...
@giner I'll fix your merge conflict and set automerge now, so we get a little bit of more testing before the release cut.
| GRPCURL_DIST=$(mktemp) | ||
| GRPCURL_TMPDIR=$(mktemp -d) | ||
|
|
||
| echo "Downloading grpcurl..." >&2 |
There was a problem hiding this comment.
Can it happen that we do this multiple times in parallel if we run grpcurl in parallel? Can't we move that installation to some init step? I get that we don't want to change to image just for that... but that last-moment install seems a bit fishy.
There was a problem hiding this comment.
(happy for that to be a follow-up improvement, if you think it makes sense)
There was a problem hiding this comment.
You are right, while it doesn't break anything here we do download in parallel. thank you for catching, I'll fix that.
…_check_connectivity [static] Signed-off-by: Martin Florian <martin.florian@digitalasset.com>
There was a problem hiding this comment.
I actually need to retract my approval following some input from a watchful colleague ™️. Downloading remote binaries in a bash script is fairly sketchy from a supply chain security standpoint; at the very least I'd like us to review this with less haste.
How robust is that sha check against future code changes to this script, for example?
Perhaps the clean way is to use a different image that has grpcurl baked in?
Let me try to refactor it a bit on the next iteration (remove some verbose bash 3 compatibility stuff, split into smaller easier to read, self described pieces). If it's still hard to read after that I'll look into rewriting it Python. |
I couldn't find a good image containing all: |
Make sure only one grpcurl downloading job runs at a time. Signed-off-by: Stanislav German-Evtushenko <ginermail@gmail.com>
|
There are two somewhat orthogonal issues for me:
I suspect in most SVs setups the info pod has direct access to things like participant admin APIs so it is highly security critical so we definitely should err on the side of being overly cautious. If we can build an image containing https://github.com/grpc-ecosystem/grpc-health-probe to get the same info, that seems like a reasonable option. I don't really know any other alternative that is well maintained and doesn't get flagged. We could write some custom python program or similar but at that point, I'd also start to somewhat question whether the complexity is worth the ROI. |
|
Thank you @moritzkiefer-da. Would it work if for now I just replace grpcurl with grpc-health-probe but keep downloading at runtime (we already do this for prom2json in the same script) and within the next few iteration I would refactor the script for more readability and add a docker image? |
|
Given that I don't see that we're in a particular rush to land this change I'd rather do it the other way around and switch to a base image containing the tools first and make the script more readable instead of adding more complexity and potentially risky downloads first. |
|
Alright, converting draft for now |
|
@giner To be clear: It seems reasonable that we add a About this PR: You could also shrink it to only scan monitoring, and follow up with the sequencer things in a follow up 🤷 |
|
@martinflorian-da scan has been already monitored, in this PR I was only splitting the status into two distinct states. It's quite minor from monitoring point of view while having sequencer monitored is more substantial. As for the base image, yes, it has |
Possible status values: