What are the requirements for the ValueID? #1063

adizere · 2025-05-30T14:19:42Z

adizere
May 30, 2025

Context

In the core-types crate there is the type definition of ValueId:

https://github.com/informalsystems/malachite/blob/4c2a9560ceb75868af23de388991a1eafc42ec9b/code/crates/core-types/src/lib.rs#L33

The Id type is an associated type on the Value trait:

https://github.com/informalsystems/malachite/blob/4c2a9560ceb75868af23de388991a1eafc42ec9b/code/crates/core-types/src/value.rs#L77-L87

Any application that builds with Malachite will define its own concrete definitions of Value and Id types. In practice, the value would be a block of transactions and a corresponding header. The identifier would then be a concise representations such as a digest of the value. It is important that the identifier has a smaller footprint than the value itself, because identifiers will be present in critical messages (Prevote, Precommit) as part of the consensus protocol, and therefore these messages should be kept small.

Alternatively, the value itself could be a concise representation of the block & header, as is done eg in Starknet sequencer test app. The advantage of this approach is that consensus would execute on more lightweight data payloads, which can be more efficient. In this case, the identifier and value coincide to be the same.

To learn more about consensus by value or by id, see also ADR 003

Question

What are the requirements on the value Id? A user asked specifically:

I presume it (the Id concrete type implementation) does not have to be a cryptographically secure hash? I assume it is a cheap checksum essentially to coordinate voting?

Answer

Briefly, the Id should provide a concise and unique representation of a Value. It should be (fairly) resistant to collision, small in size, cheap to verify. As hinted above, the rationale is that the consensus protocol (i.e., votes such as Prevotes and Precommits that each validator casts for reaching agreement on each Height) does not operate on the full Value. That would have a potentially significant performance penalty. Instead, consensus operates on identifiers of values. This is much cheaper, because votes will consist of tiny messages, instead of a potentially large payloads (eg. the entire block block of transactions at any given height).

Other context that can be of help is in issue 365: https://github.com/informalsystems/malachite/issues/365

What can happen if there are collisions?

Put differently, what happens if two validators see a Prevote message for an identifier 'x' and each of those validators have a different view on what is the value associated to that identifier? There could be:

validator A see block 'B1' that has the SHA256 checksum 'x'
validator B see block 'B2' that has the SHA256 checksum 'x', where 'B1' != 'B2'

What happens then? In such a case, the two validators will reach agreement on the same identifier but upon executing or deriving the next state of the application (concretely by applying some state transition function with input 'B1' and input 'B2' respectively), will lead to diverging application states. Validator A could see state 'S1' and validator B obtains the local application state 'S2'.

How practical is this problem?

In practice, it is unlikely to occur. But it is advised to use the state of art in collision-resistance digest functions for computing the Id.

It's worth noting that this issue is not specific to Malachite, nor to Tendermint consensus protocol. For instance, Bitcoin block hashes and mining happens via such digests functions. Using digests of concise identifiers is a standard practice in (secure) networking protocols.

Specific recommendations

We recommend that teams use widely-used and well-studied cryptographic hash functions for computing the Id, such as SHA256, SHA3/Keccak, or BLAKE3. Unless you know very well what you're doing, MD5 or SHA1 or others (eg 64-bit ones) should never be used for constructing the value Id.

Other mitigations

It would not be (in general) a good mitigation to do agreement & consensus by value. That is, to make the Id coincide with the full payload Value because of the reasons above (i.e., performance penalty).
To mitigate the problem, one practical solution is that validators do not only reach agreement on inputs (i.e., block 'B1' or 'B2') but also on outputs. Namely, in each block, the validators both agree on the hash 'x' of the current input (a block such as 'B1' or 'B2') but also on the application (Merkle tree) root hash that resulted by applying the state transition function at the prior height.

Other mitigations or recommendations probably exist, please share below.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are the requirements for the ValueID? #1063

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

What are the requirements for the ValueID? #1063

Uh oh!

Uh oh!

adizere May 30, 2025

Context

Question

Answer

What can happen if there are collisions?

How practical is this problem?

Specific recommendations

Replies: 0 comments

adizere
May 30, 2025