diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 5165f20..2bf404e 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -27,4 +27,9 @@ - [Swaps](./dex/swaps.md) - [SDK](./dex/sdk.md) +- [Light Clients](./light-clients/intro.md) + + - [Ethereum](./light-clients/ethereum.md) + - [Inclusion proofs](./light-clients/inclusion-proofs.md) + - [Resources](./resources.md) diff --git a/src/light-clients/ethereum.md b/src/light-clients/ethereum.md new file mode 100644 index 0000000..96d09a2 --- /dev/null +++ b/src/light-clients/ethereum.md @@ -0,0 +1,137 @@ +# Ethereum Light Client + +This chapter explores the architecture and operation of the Ethereum light client, along with the cryptographic foundations that ensure the validity of its proofs. We'll begin with a high-level overview of the protocol before diving into the technical details of each component. + +## Protocol Overview + +A light client exposes several public functions that can be invoked by external parties. Each function call modifies the client's internal state `S`, storing critical data that informs subsequent operations. The light client lifecycle can be divided into three distinct phases: + +1. **Initialization**: Occurs exactly once when the client is created +1. **Updating**: Called repeatedly to verify each new block +1. **Freezing**: Invoked once if malicious behavior is detected + +Initialization typically happens during contract deployment for smart contract-based light clients. During this phase, we provide the client with its initial trusted state, which can be either: + +- The genesis block of the chain +- A recent, well-established checkpoint block + +This initial block is known as the "trusted height" and serves as the foundation for all subsequent verifications. Since this block is assumed to be correct without cryptographic verification within the light client itself, its selection is critical. In production environments, a governance proposal or similar consensus mechanism often validates this block before it's passed to the light client. + +Once initialized, the light client can begin verifying new blocks. The update function accepts a block header and associated cryptographic proofs, then: + +1. Verifies the header's cryptographic integrity +1. Validates the consensus signatures from the sync committee +1. Updates the client's internal state to reflect the new "latest verified block" + +Updates can happen in sequence (verifying each block) or can skip intermediate blocks using more complex proof mechanisms. The efficiency of this process is what makes light clients practical for cross-chain communication. + +If the light client detects conflicting information or invalid proofs that suggest an attack attempt, it can enter a "frozen" state. This is a safety mechanism that prevents the client from processing potentially fraudulent updates. Recovery from a frozen state typically requires governance intervention. + +Since initialization is rather trivial, we will not dive deeper into it. + +### Updating + +Since Ethereum is finalized by the Beacon Chain, our Ethereum light client accepts beacon block data as update input. A [beacon block](https://eth2book.info/capella/part3/containers/blocks/#beacon-blocks) roughly has this structure: + +```python +class BeaconBlockBody(Container): + randao_reveal: BLSSignature + eth1_data: Eth1Data + graffiti: Bytes32 + proposer_slashings: List[ProposerSlashing, MAX_PROPOSER_SLASHINGS] + attester_slashings: List[AttesterSlashing, MAX_ATTESTER_SLASHINGS] + attestations: List[Attestation, MAX_ATTESTATIONS] + deposits: List[Deposit, MAX_DEPOSITS] + voluntary_exits: List[SignedVoluntaryExit, MAX_VOLUNTARY_EXITS] + sync_aggregate: SyncAggregate + execution_payload: ExecutionPayload + bls_to_execution_changes: List[SignedBLSToExecutionChange, MAX_BLS_TO_EXECUTION_CHANGES] +``` + +We are specifically interested in `sync_aggregate`, which is a structure describing the votes of the sync committee: + +```python +class SyncAggregate(Container): + sync_committee_bits: Bitvector[SYNC_COMMITTEE_SIZE] + sync_committee_signature: BLSSignature +``` + +The `sync_committee_bits` indicate which members voted (not all need to vote), and the `sync_committee_signature` is a BLS signature of the members referenced in the bit vector. + +BLS signatures (Boneh-Lynn-Shacham) are a type of cryptographic signature scheme that allows multiple signatures to be aggregated into a single signature. This makes them space and compute efficient (you can aggregate hundreds of signatures into one). Just as we aggregate signatures, we can aggregate public keys as well, such that the aggregate public key can verify the aggregated signature. + +For our SyncAggregate, computing the aggregate pubkey is simple: + +```python +def _aggregate_pubkeys(committee, bits) + pubkeys = [] + for i, bit in enumerate(bits): + if bit: + pubkeys.append(committee[i]) + return bls.Aggregate(pubkeys) +``` + +At scale, we can aggregate thousands (if not hundreds of thousands) of signatures and public keys, while only verifying their aggregates. + +To our light client, as long as a majority of sync committee members have attested the block, it is considered final. + +```python +class LightClient(): + def update(self, block: BeaconBlockBody): + + # Count how many committee members signed + signature_count = sum(sync_aggregate.sync_committee_bits) + + # Need 2/3+ committee participation for finality + if signature_count < (SYNC_COMMITTEE_SIZE * 2) // 3: + raise ValueError("Insufficient signatures from sync committee") + + # Construct aggregate public key from the current committee and bit vector + aggregate_pubkey = _aggregate_pubkeys( + self.current_sync_committee, + block.sync_aggregate.sync_committee_bits + ) +``` + +Now we have the `aggregate_pubkey` for the committee, as well as verifying that enough members have signed. Notice that to obtain the sync committee public keys, we used `self.current_sync_committee`. This is set during initialization, and later updated in our `update` function. + +Next we have to construct the digest (what has been signed) before we verify the aggregated signature. If we didn't compute the digest ourselves, but obtained it from the block, then the caller could fraudulently pass a correct digest, but have other values in the block altered. + +```python + signing_root = self._compute_signing_root(block) + + # Verify the aggregated signature against the aggregated public key + if not bls.Verify( + aggregate_pubkey, + signing_root, + sync_aggregate.sync_committee_signature + ): + raise ValueError("Invalid sync committee signature") +``` + +Since the signature and block are both valid, we can now trust the contents of the passed beacon block. Next the light client will store data from the block: + +```python + self.latest_block_root = self._compute_block_root(block) + self.latest_slot = block.slot +``` + +Finally, we have to update the sync committee. The committee rotates every sync committee period (256 epochs), and thus if this is at the boundary, we have to update these values. Luckily Ethereum makes this easy for us, and provides what the next sync committee will be: + +```python + if slot % (SLOTS_PER_EPOCH * EPOCHS_PER_SYNC_COMMITTEE_PERIOD) == 0: + self.current_sync_committee = self.next_sync_committee + self.next_sync_committee = block.next_sync_committee +``` + +`SLOTS_PER_EPOCH` and `EPOCHS_PER_SYNC_COMMITTEE_PERIOD` can be hardcoded, or stored in the light client state. Each epoch is 32 slots (approximately 6.4 minutes), so a full sync committee period lasts about 27.3 hours. + +With this relatively simple protocol, we now have a (python) smart contract that can track Ethereum's blocks. + +### Optimizations + +In actuality, the beacon block is still too large for a light client. The actual light client uses the [`LightClientHeader`](https://github.com/unionlabs/union/blob/cfe862e6dacf5474925110504891fa4120e747f6/lib/beacon-api-types/src/deneb/light_client_header.rs#L16C12-L16C29) data structure, which consists of a beacon header and execution header. + +The beacon header is used to prove the consensus and transition the internal state, as well as immediately prove that the execution header is valid. The block height in the execution header is then used for further client operations, such as transaction timeouts. Using the execution height instead of the beacon height for timeouts has advantages for users and developers, ensuring they do not even need to be aware of the Beacon Chain's existence. + +Another significant optimization relates to signature aggregation. Since the majority of the sync committee always signs, we instead aggregate the public keys of the non-signers, and subtract that from the aggregated total. Effectively, if on average 90% of members sign, we submit the 10% that did not sign. This results in an approximate 80% computational reduction (by avoiding the need to process 90% of the signatures individually), as well as reducing the size of the client update transaction. diff --git a/src/light-clients/inclusion-proofs.md b/src/light-clients/inclusion-proofs.md new file mode 100644 index 0000000..d8fe9b1 --- /dev/null +++ b/src/light-clients/inclusion-proofs.md @@ -0,0 +1,211 @@ +# Inclusion Proofs + +Now that we understand how to verify blocks using a light client, we will show how based on these verified blocks, we can verify state proofs, and in extension of that, messages and transfers. We will explore how to efficiently prove that a piece of data is included in a larger dataset without needing to reveal or access the entire dataset. + +## Merkle Trees + +A Merkle tree (or hash tree) is a data structure that allows for efficient and secure verification of content in a large body of data. Named after Ralph Merkle, who patented it in 1979, these trees are fundamental building blocks in distributed systems and cryptographic applications. + +### Structure of a Merkle Tree + +A Merkle tree is a binary tree where: + +- Each leaf node contains the hash of a data block +- Each non-leaf node contains the hash of the concatenation of its child nodes + +Let's visualize a simple Merkle tree with 4 data blocks: + +```mermaid +graph TD + Root["Root Hash: H(H1 + H2)"] --> H1["H1: H(H1-1 + H1-2)"] + Root --> H2["H2: H(H2-1 + H2-2)"] + H1 --> H1-1["H1-1: Hash(Data 1)"] + H1 --> H1-2["H1-2: Hash(Data 2)"] + H2 --> H2-1["H2-1: Hash(Data 3)"] + H2 --> H2-2["H2-2: Hash(Data 4)"] +``` + +In this diagram: + +1. We start with four data blocks: Data 1, Data 2, Data 3, and Data 4 +1. We compute the hash of each data block to create the leaf nodes (H1-1, H1-2, H2-1, H2-2) +1. We then pair the leaf nodes and hash their concatenated values to form the next level (H1, H2) +1. Finally, we concatenate and hash the results to get the root hash + +The root hash uniquely represents the entire dataset. If any piece of data in the tree changes, the root hash will also change. Root hashes are present in block headers for most blockchains. Ethereum for example has the`state_root` field in each header. With the state root, we can construct a proof for any data stored on Ethereum, so whenever we write value V to storage in a solidty smart contract at block H, we can construct a proof to show that from H onwards, that slot contains V. This proof will be valid until we update or delete the value. + +```python +def prove(state_root, proof, V) -> Boolean +``` + +For Merkle trees specifically, we construct Merkle Inclusion proofs. Constructing the proof is relatively compute intensive and requires access to the full state and history, so only archive nodes are capable of doing so. + +## Inclusion Proofs + +An inclusion proof (also called a Merkle proof) is a way to verify that a specific data block is part of a Merkle tree without having to reveal the entire tree. + +An inclusion proof consists of: + +1. The data block to be verified +1. A "proof path" - a list of hashes that, combined with the data block's hash in the right order, will reproduce the root hash + +Let's visualize how a Merkle proof works for Data 2 in our example: + +In this visualization, we're proving that Data 2 is included in the tree. The pink node is the data we're proving (Data 2), and the blue nodes represent the proof path hashes we need. + +The proof for Data 2 consists of: + +- The data itself: Data 2 +- The proof path: \[H1-1, H2\] + +```mermaid +graph TD + Root["Root Hash: H(H1 + H2)"] --> H1["H1: H(H1-1 + H1-2)"] + Root --> H2["H2: H(H2-1 + H2-2)"] + H1 --> H1-1["H1-1: Hash(Data 1)"] + H1 --> H1-2["H1-2: Hash(Data 2)"] + H2 --> H2-1["H2-1: Hash(Data 3)"] + H2 --> H2-2["H2-2: Hash(Data 4)"] +``` + +To verify that Data 2 is indeed part of the Merkle tree with root hash R, a verifier would: + +1. Compute H1-2 = Hash(Data 2) +1. Compute H1 = Hash(H1-1 + H1-2) using the provided H1-1 +1. Compute Root = Hash(H1 + H2) using the provided H2 +1. Compare the computed Root with the known Root hash + +If they match, it proves Data 2 is part of the tree. Effectively we recompute the state root lazily. + +For a Merkle tree with n leaves, the proof size and verification time are both O(log n), making it highly efficient even for very large datasets. + +For example, in a tree with 1 million leaf nodes: + +- A full tree would require storing 1 million hashes +- A Merkle proof requires only about 20 hashes (log₂ 1,000,000 ≈ 20) + +## Message Verification using Inclusion Proofs + +Now that we have a clearly defined model on how to get blocks from chain `source` on chain `destination` using a [light client](./ethereum.md), and how to prove state from `source` on `destination` using `source`'s state root, we will show a simple model on how to securely perform cross chain messaging using a state proof. + +First, we need to write our message to a known storage location on the source chain. This is typically done through a smart contract: + +```solidity +// On source chain +contract MessageSender { + // Maps message IDs to actual messages + mapping(uint256 => bytes) public messages; + + // Event emitted when a new message is stored + event MessageStored(uint256 indexed messageId, bytes message); + + function sendMessage(bytes memory message) public returns (uint256) { + uint256 messageId = hash(message) + messages[messageId] = message; + emit MessageStored(messageId, message); + return messageId; + } +} +``` + +When `sendMessage` is called, the message is stored in the contract's state at a specific storage slot that can be deterministically calculated from the messageId. + +Next, we need to update the light client on the destination chain to reflect the latest state of the source chain: + +```solidity +// On destination chain +contract LightClient { + // Latest verified block header from source chain + BlockHeader public latestHeader; + + function updateBlockHeader(BlockHeader memory newHeader, Proof memory proof) public { + // Verify the proof that newHeader is a valid successor to latestHeader + require(verifyProof(latestHeader, newHeader, proof), "Invalid block proof"); + + // Update the latest header + latestHeader = newHeader; + } + + function getStateRoot() public view returns (bytes32) { + return latestHeader.stateRoot; + } +} +``` + +The light client maintains a record of the latest verified block header, which includes the state root of the source chain. Regular updates to this light client ensure that the destination chain has access to recent state roots. + +Finally, we can prove the existence of the message on the destination chain using a Merkle inclusion proof against the state root: + +```solidity +// On destination chain +contract MessageReceiver { + LightClient public lightClient; + address public sourceSenderContract; + + constructor(address _lightClient, address _sourceSender) { + lightClient = LightClient(_lightClient); + sourceSenderContract = _sourceSender; + } + + function verifyAndProcessMessage( + uint256 messageId, + bytes memory message, + bytes32[] memory proofNodes, + uint256 proofPath + ) public { + // Get the latest state root from the light client + bytes32 stateRoot = lightClient.getStateRoot(); + + // Calculate the storage slot for this message in the source contract + bytes32 storageSlot = keccak256(abi.encode(messageId, uint256(1))); // Slot for messages[messageId] + + // Verify the inclusion proof against the state root + require( + verifyStorageProof( + stateRoot, + sourceSenderContract, + storageSlot, + message, + proofNodes, + proofPath + ), + "Invalid state proof" + ); + + // Message is verified, now process it + processMessage(messageId, message); + } + + function processMessage(uint256 messageId, bytes memory message) internal { + // Application-specific message handling + // ... + } + + function verifyStorageProof( + bytes32 stateRoot, + address contractAddress, + bytes32 slot, + bytes memory expectedValue, + bytes32[] memory proofNodes, + uint256 proofPath + ) internal pure returns (bool) { + // This function verifies a Merkle-Patricia trie proof + // It proves that at the given storage slot in the specified contract, + // the value matches expectedValue in the state with stateRoot + + // Implementation details omitted for brevity + // This would use the proofNodes and proofPath to reconstruct the path + // from the leaf (storage value) to the state root + + return true; // Placeholder + } +} +``` + +This mechanism ensures that messages can only be processed on the destination chain if they were genuinely recorded on the source chain, without requiring trust in any intermediaries. The security of the system relies on: + +1. The integrity of the light client, which only accepts valid block headers +1. The cryptographic properties of Merkle trees, which make it impossible to forge inclusion proofs +1. The immutability of blockchain state, which ensures the message cannot be altered once written + +By combining light client verification with state inclusion proofs, we establish a trustless bridge for cross-chain communication that maintains the security properties of both blockchains. diff --git a/src/light-clients/intro.md b/src/light-clients/intro.md new file mode 100644 index 0000000..538b190 --- /dev/null +++ b/src/light-clients/intro.md @@ -0,0 +1,66 @@ +# Light Clients + +Trust-minimized interoperability protocols use light clients to secure message passing between blockchains. Light clients can be implemented as smart contracts, Cosmos SDK modules, or components within wallets. Their fundamental purpose is to verify the canonicity of new blocks—confirming that a block is a valid addition to a blockchain—without requiring the full block data. + +### Block Structure Fundamentals + +A blockchain block typically consists of two main sections: + +**Header**: Contains metadata about the block, including: + +- Block producer information +- Block height and timestamp +- Previous block hash +- State root (a cryptographic summary of the blockchain's state) +- Transaction root (a Merkle root of all transactions in the block) +- Other consensus-specific data + +**Body**: Contains the complete list of transactions included in the block. + +The header has a fixed size (typically a few hundred bytes), while the body's size varies dramatically based on the number and complexity of transactions. This size difference is crucial for understanding light client efficiency. + +The key distinction between light clients and full nodes lies in their data requirements: + +- **Light clients** only process block headers, which enables efficient verification with minimal data (kilobytes instead of megabytes or gigabytes) +- **Full nodes** process both headers and bodies, requiring significantly more computational resources and storage + +This efficiency makes light clients ideal for cross-chain communication, mobile applications, and resource-constrained environments. + +Light clients achieve security through cryptographic verification rather than data replication. They: + +1. Track validator sets from the source blockchain +1. Verify consensus signatures on new block headers +1. Validate state transitions through cryptographic proofs +1. Maintain only the minimal state required for validation + +This approach ensures that even with minimal data, light clients can detect invalid or malicious blocks. + +Light clients form the backbone of trustless bridge infrastructure: + +- Smart contract-based light clients on Ethereum can verify Cosmos chain blocks +- Cosmos modules can verify Ethereum blocks using embedded light clients +- Cross-rollup communication can leverage light client technology for L2-to-L2 messaging + +When implemented as bridge components, light clients enable secure cross-chain asset transfers and message passing without requiring trusted third parties. + +### Wallets and User Interfaces + +Modern wallet implementations increasingly incorporate light client technology: + +- Mobile wallets can verify transactions without syncing the entire chain +- Browser extensions can validate state without backend reliance +- Hardware wallets can verify complex operations with limited resources + +This improves both security and user experience by reducing dependency on remote (RPC) servers. + +### Ethereum Light Client Deep Dive + +Ethereum's light client protocol is particularly significant for Union's architecture. It uses a combination of: + +1. **Consensus verification**: Validating signatures from the beacon chain's validator set +1. **Sync committees**: Tracking rotating sets of validators for efficient verification +1. **Merkle proofs**: Verifying transaction inclusion and state values without downloading the full state + +Ethereum light clients can securely validate blocks with just a few kilobytes of data, compared to the hundreds of megabytes required for full validation. This efficiency makes them ideal for cross-chain applications. + +In subsequent sections, we'll examine how Union leverages these light client principles to secure cross-chain communication and explore implementation details of the Ethereum light client that secures a significant portion of Union's traffic. diff --git a/src/resources.md b/src/resources.md index 5a5e065..5cba0db 100644 --- a/src/resources.md +++ b/src/resources.md @@ -3,3 +3,4 @@ Make sure to DYOR: - [docs.union.build](https://docs.union.build) - [github.com/unionlabs/union](https://github.com/unionlabs/union) - [IBC](https://github.com/cosmos/ibc) +- [eth2book](https://eth2book.info)