Skip to content

Detect and handle corrupt or missing blocks or indexes #537

@jmjatlanta

Description

@jmjatlanta

Situation:

When a node encounters a problem, the block files or index files can become corrupted or have incomplete data.

Weapons:

Corruption and missing data can be detected. Other nodes can provide the information the corrupted node lacks.

Objective:

Detect the problem and repair it before allowing the node to report that it is fully synchronized.

Tactics:

  1. For the case of corrupted block files, the chain must be downloaded from the network starting from the point the corruption was detected.
  2. For missing block files, the chain must be downloaded and the node re-indexed.
  3. For corrupted or missing indexes, the problem must be accurately detected and the user must be prompted to restart the node with the -reindex option.

Where we are now:

Some situations of block or index corruption are detected, others are not.

As an example, having the node crash after writing the block to disk but before the index file is written will lead to missing transactions when the node restarts. The node does restart, but the data is inaccurate. Looking for the block returns nullptr and looking for a transaction that was within that block returns that the transaction does not exist.

Additional Information:

The function LoadIndexDB() does some checks to verify the integrity of the block files. This may be a good place to add additional checks to verify that the blocks and the index are synchronized.

Note: The attempt is to assist node operators when hardware/software issues make a mess within the persisted data on disk. Detecting malicious modification to distort data is not considered here.

Note: Having a block who's previous block does not exist may not be an indication of corruption. It is a valid (temporary) situation that must be planned for.

In my testing:

  1. Having an incomplete index file (an entire entry about a block does not exist) is not detected, and the node starts with incomplete data.
  2. Having a corrupted index file (data truncated off the end) is not detected, and the node starts with a shortened chain. I have yet to test to see if it re-syncs correctly.
  3. Having an incomplete block file (an entire block does not exist but does exist in the index) is not detected, but would probably be a very rare occurrence. We could test for it, but we may not want to concentrate heavily on detecting/fixing it.
  4. Having a corrupted block file (block cannot be de-serialized) is detected. I have yet to test what options are available for a node operator beyond a full re-sync.

ToDo:

  • Verify the findings above are accurate for different combinations of corruption / missing data.
  • Run tests on a multi-node chain to determine current abilities for recovering from corruption.

See also:

Bitcoin issue 19274

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions