You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Any application that builds with Malachite will define its own concrete definitions of Value and Id types. In practice, the value would be a block of transactions and a corresponding header. The identifier would then be a concise representations such as a digest of the value. It is important that the identifier has a smaller footprint than the value itself, because identifiers will be present in critical messages (Prevote, Precommit) as part of the consensus protocol, and therefore these messages should be kept small.
Alternatively, the value itself could be a concise representation of the block & header, as is done eg in Starknet sequencer test app. The advantage of this approach is that consensus would execute on more lightweight data payloads, which can be more efficient. In this case, the identifier and value coincide to be the same.
To learn more about consensus by value or by id, see also ADR 003
Question
What are the requirements on the value Id? A user asked specifically:
I presume it (the Id concrete type implementation) does not have to be a cryptographically secure hash? I assume it is a cheap checksum essentially to coordinate voting?
Answer
Briefly, the Id should provide a concise and unique representation of a Value. It should be (fairly) resistant to collision, small in size, cheap to verify. As hinted above, the rationale is that the consensus protocol (i.e., votes such as Prevotes and Precommits that each validator casts for reaching agreement on each Height) does not operate on the full Value. That would have a potentially significant performance penalty. Instead, consensus operates on identifiers of values. This is much cheaper, because votes will consist of tiny messages, instead of a potentially large payloads (eg. the entire block block of transactions at any given height).
Put differently, what happens if two validators see a Prevote message for an identifier 'x' and each of those validators have a different view on what is the value associated to that identifier? There could be:
validator A see block 'B1' that has the SHA256 checksum 'x'
validator B see block 'B2' that has the SHA256 checksum 'x', where 'B1' != 'B2'
What happens then? In such a case, the two validators will reach agreement on the same identifier but upon executing or deriving the next state of the application (concretely by applying some state transition function with input 'B1' and input 'B2' respectively), will lead to diverging application states. Validator A could see state 'S1' and validator B obtains the local application state 'S2'.
How practical is this problem?
In practice, it is unlikely to occur. But it is advised to use the state of art in collision-resistance digest functions for computing the Id.
It's worth noting that this issue is not specific to Malachite, nor to Tendermint consensus protocol. For instance, Bitcoin block hashes and mining happens via such digests functions. Using digests of concise identifiers is a standard practice in (secure) networking protocols.
Specific recommendations
We recommend that teams use widely-used and well-studied cryptographic hash functions for computing the Id, such as SHA256, SHA3/Keccak, or BLAKE3. Unless you know very well what you're doing, MD5 or SHA1 or others (eg 64-bit ones) should never be used for constructing the value Id.
Other mitigations
It would not be (in general) a good mitigation to do agreement & consensus by value. That is, to make the Id coincide with the full payload Value because of the reasons above (i.e., performance penalty).
To mitigate the problem, one practical solution is that validators do not only reach agreement on inputs (i.e., block 'B1' or 'B2') but also on outputs. Namely, in each block, the validators both agree on the hash 'x' of the current input (a block such as 'B1' or 'B2') but also on the application (Merkle tree) root hash that resulted by applying the state transition function at the prior height.
Other mitigations or recommendations probably exist, please share below.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Context
In the
core-typescrate there is the type definition ofValueId:https://github.com/informalsystems/malachite/blob/4c2a9560ceb75868af23de388991a1eafc42ec9b/code/crates/core-types/src/lib.rs#L33
The
Idtype is an associated type on theValuetrait:https://github.com/informalsystems/malachite/blob/4c2a9560ceb75868af23de388991a1eafc42ec9b/code/crates/core-types/src/value.rs#L77-L87
Any application that builds with Malachite will define its own concrete definitions of
ValueandIdtypes. In practice, the value would be a block of transactions and a corresponding header. The identifier would then be a concise representations such as a digest of the value. It is important that the identifier has a smaller footprint than the value itself, because identifiers will be present in critical messages (Prevote, Precommit) as part of the consensus protocol, and therefore these messages should be kept small.Alternatively, the value itself could be a concise representation of the block & header, as is done eg in Starknet sequencer test app. The advantage of this approach is that consensus would execute on more lightweight data payloads, which can be more efficient. In this case, the identifier and value coincide to be the same.
To learn more about consensus by value or by id, see also ADR 003
Question
What are the requirements on the value
Id? A user asked specifically:Answer
Briefly, the
Idshould provide a concise and unique representation of aValue. It should be (fairly) resistant to collision, small in size, cheap to verify. As hinted above, the rationale is that the consensus protocol (i.e., votes such as Prevotes and Precommits that each validator casts for reaching agreement on each Height) does not operate on the fullValue. That would have a potentially significant performance penalty. Instead, consensus operates on identifiers of values. This is much cheaper, because votes will consist of tiny messages, instead of a potentially large payloads (eg. the entire block block of transactions at any given height).Other context that can be of help is in issue 365: https://github.com/informalsystems/malachite/issues/365
What can happen if there are collisions?
Put differently, what happens if two validators see a Prevote message for an identifier 'x' and each of those validators have a different view on what is the value associated to that identifier? There could be:
What happens then? In such a case, the two validators will reach agreement on the same identifier but upon executing or deriving the next state of the application (concretely by applying some state transition function with input 'B1' and input 'B2' respectively), will lead to diverging application states. Validator A could see state 'S1' and validator B obtains the local application state 'S2'.
How practical is this problem?
In practice, it is unlikely to occur. But it is advised to use the state of art in collision-resistance digest functions for computing the
Id.It's worth noting that this issue is not specific to Malachite, nor to Tendermint consensus protocol. For instance, Bitcoin block hashes and mining happens via such digests functions. Using digests of concise identifiers is a standard practice in (secure) networking protocols.
Specific recommendations
We recommend that teams use widely-used and well-studied cryptographic hash functions for computing the
Id, such as SHA256, SHA3/Keccak, or BLAKE3. Unless you know very well what you're doing, MD5 or SHA1 or others (eg 64-bit ones) should never be used for constructing the valueId.Other mitigations
Idcoincide with the full payloadValuebecause of the reasons above (i.e., performance penalty).Other mitigations or recommendations probably exist, please share below.
Beta Was this translation helpful? Give feedback.
All reactions