Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chain: wait for bitcoind rpc to warm up #677

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sangaman
Copy link

This modifies the Start method to handle the RPC_IN_WARMUP error code of -28 by retrying the rpc call to determine the current network every second until it either succeeds or fails with a different code. The current behavior fails and terminates the connection upon receiving this error code. This change allows for connecting to a recently started bitcoind node and starting the client while bitcoind is still warming up.

Related issues: lightningnetwork/lnd#1533 & https://github.com/ExchangeUnion/xud-docker/issues/195

I first tried to approach this issue with changes to the lnd codebase, but the relevant code is what's being modified here. Attempting to call the Start() method from lnd a second time, for instance, doesn't work because of the up front check that determines whether Start() has already been called and immediately returns if so.

I've tested this locally by pointing btcwallet in go.mod to my local repository and unlocking lnd moments after starting bitcoind. Instead of shutting down, lnd will keep trying bitcoind until it gets a non-error response and then continue with its initialization process. This would save some grief when bringing a node online and starting bitcoind & lnd at roughly the same time.

} else if err.(*btcjson.RPCError).Code == -28 {
// Error code -28 indicates the bitcoind rpc is warming up.
// We will wait and recheck periodically until rpc is ready.
time.Sleep(time.Second * 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks okay, the only thing I'm worried about is bitcoind hanging, always returning this code. In that case it looks like we'll deadlock here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative would be to crash, which is what happens now. This approach looks good to me, maybe just add a log like Waiting for bitcoind RPC ready signal that's only logged once if we ever run into this error code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could perhaps be some upper limit of retries before returning an error. A couple of minutes perhaps? But yeah the alternative is crashing, and if ever bitcoind is stuck in this rpc warming up status, something is seriously wrong and requires intervention. I can't say I've ever seen bitcoind get stuck in this point, even if there's something wrong like db corruption it'll eventually crash in my experience.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sangaman in your experience, what has been the longest bitcoind has been in this warmup state? Waiting something like 5 mins sounds good to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, even on a low-powered ARM machine this warm-up stage takes about a minute or two by default, rarely much more than than that. However, it's also possible to change the config to increase the amount of checks that are done before starting the bitcoind server, theoretically this could take hours if you had it do very deep/thorough checks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I'm fine with blocking indefinitely as long as a log statement exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, isn't bitcoind in that warmup state for the entire initial blockchain sync (which can take days, or more)?

I think it should wait indefinitely. However the sleep could maybe be increased slowly to a max of 10 minutes or so.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't think so, the warm up state is just while it's verifying the db and blocks on disk. Once it starts accepting and validating new blocks, it's past the warm up state and will respond to rpc requests normally.

@Roasbeef
Copy link
Member

Roasbeef commented Nov 4, 2020

What's needed to unblock this?

@halseth
Copy link
Contributor

halseth commented Nov 4, 2020

Add a log, then I think it's good.

ping @sangaman

@sangaman
Copy link
Author

sangaman commented Dec 1, 2020

Just saw this ping, I'll add a log statement and review the feedback. Sorry this fell off my radar.

@sangaman
Copy link
Author

sangaman commented Dec 2, 2020

I updated this PR to use the newly available btcjson.ErrRPCInWarmup variable (which equals -28) instead of writing in -28 and I added logic to log Waiting for bitcoind RPC to finish warming up... the first time we see that code and Bitcoind finished warming up and RPC is ready after the code goes away. Ping @halseth @wpaulino

Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🌡️

if err == nil {
// No error means bitcoind is responsive and we can proceed.
break
} else if err.(*btcjson.RPCError).Code == btcjson.ErrRPCInWarmup {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do a soft type assertion here to prevent a panic if err is not of the type *btcjson.RPCError:

} else if rpcErr, ok := err.(*btcjson.RPCError); ok && rpcErr.Code == btcjson.ErrRPCInWarmup {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, makes sense. Done.

@wpaulino
Copy link
Contributor

wpaulino commented Jan 6, 2021

cc @Roasbeef

@sangaman
Copy link
Author

sangaman commented Jun 8, 2021

Should I continue with this PR? I do think it's a useful feature. It looks like I'll need to resolve some conflicts with pruned node logic.

@wpaulino
Copy link
Contributor

wpaulino commented Jun 8, 2021

@sangaman yes! It's already gotten approvals, @Roasbeef is just the only one with merge access. I can take another look once rebased.

@sangaman
Copy link
Author

sangaman commented Jun 9, 2021

@wpaulino @Roasbeef I rebased the changes, the code didn't really need to change I just had to move it to keep up with the updated structure.

Comment on lines 150 to 152
var (
net wire.BitcoinNet
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this can be collapsed into a single line var net wire.BitcoinNet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, a leftover from when this also declared err which didn't exactly work any more since it's being used above.

if err == nil {
// No error means bitcoind is responsive and we can proceed.
break
} else if rpcErr, ok := err.(*btcjson.RPCError); ok && rpcErr.Code == btcjson.ErrRPCInWarmup {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: line exceeds 80 character limit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I split this into two lines, for instance by moving everything after the semicolon to its own line, then make lint doesn't pass and tells me to run gofmt, but if I do that it moves this else if statement back to one line. So I'm not sure what to do, it feels like I'm fighting against the style rules defined by the project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the else if can just become an if since the clause above breaks out. We can then split the line at &&.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for that bit of hand holding, that works out.

This modifies the `Start` method to handle the `RPC_IN_WARMUP` error
code of `-28` by retrying the rpc call to determine the current network
every second until it either succeeds or fails with a different code.
The current behavior fails and terminates the connection upon receiving
this error code. This change allows for connecting to a recently started
bitcoind node and starting the client while bitcoind is still warming
up.

Related issues: lightningnetwork/lnd#1533 &
https://github.com/ExchangeUnion/xud-docker/issues/195
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants