Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement safeguards to check plan against the archive #24

Merged
merged 2 commits into from
Jan 29, 2025

Conversation

cronokirby
Copy link
Collaborator

This can help prevent some potential mistakes in creating a plan, by checking that for the steps of the plan, the first and last blocks that it needs are present, and that any genesis it needs is also present. This can prevent off by one mistakes around upgrades in particular, since the genesis won't exist.

@conorsch conorsch self-requested a review January 25, 2025 00:31
&self,
start: u64,
archive: &Archive,
) -> anyhow::Result<anyhow::Result<()>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably simplify the signature and the rest of the call stack here, right? It sounds like the inner error is a boolean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a double-take on the sig too. Even if this is necessary, would appreciate more comments to explain.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially wrote this change with a boolean, but then decided against it, because it's useful to understand and print why a plan failed a check against an archive, and not just that it failed.

As far as the nested results, this is because the first layer can signal spurious failure (io ops failing, whatever the sqlite lib might throw at us), while the latter signals permanent failure. The plan will never succeed against that archive without manual intervention if the inner error is set. By having two layers, we could in a further change add retries around the outer layer. I think distinguishing between "errors you can do something about" and "errors you can't hope to resolve" is good.

@conorsch
Copy link
Contributor

I haven't actually functionally tested this yet, was focused on #25. @cronokirby I see we need a rebase here: can you do so and then ping me back? Been running the reindexer a lot lately, so I'm happy to give this another go as sanity check.

Copy link
Contributor

@conorsch conorsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs rebase

This adds some basic checks that the archive has critical blocks required
by the plan, and that the geneses required by the plan are also present.
@cronokirby cronokirby force-pushed the cronokirby/archive-safeguards branch from 21cb413 to e4da51d Compare January 28, 2025 21:32
@cronokirby cronokirby requested a review from conorsch January 28, 2025 21:32
@conorsch
Copy link
Contributor

Used this to run another archive and subsequent regen on the penumbra-testnet-phobos-2 chain data, and it's working well. I did not encounter the missing-genesis-file problem reported in #25 (comment).

@conorsch conorsch merged commit 4bcbcc0 into main Jan 29, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants