unionlabs · KaiserKarel · Mar 24, 2025 · Mar 20, 2025 · Mar 20, 2025 · Mar 21, 2025
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
@@ -27,4 +27,9 @@
   - [Swaps](./dex/swaps.md)
   - [SDK](./dex/sdk.md)
 
+- [Light Clients](./light-clients/intro.md)
+
+  - [Ethereum](./light-clients/ethereum.md)
+  - [Inclusion proofs](./light-clients/inclusion-proofs.md)
+
 - [Resources](./resources.md)
diff --git a/src/light-clients/ethereum.md b/src/light-clients/ethereum.md
@@ -0,0 +1,137 @@
+# Ethereum Light Client
+
+This chapter explores the architecture and operation of the Ethereum light client, along with the cryptographic foundations that ensure the validity of its proofs. We'll begin with a high-level overview of the protocol before diving into the technical details of each component.
+
+## Protocol Overview
+
+A light client exposes several public functions that can be invoked by external parties. Each function call modifies the client's internal state `S`, storing critical data that informs subsequent operations. The light client lifecycle can be divided into three distinct phases:
+
+1. **Initialization**: Occurs exactly once when the client is created
+1. **Updating**: Called repeatedly to verify each new block
+1. **Freezing**: Invoked once if malicious behavior is detected
+
+Initialization typically happens during contract deployment for smart contract-based light clients. During this phase, we provide the client with its initial trusted state, which can be either:
+
+- The genesis block of the chain
+- A recent, well-established checkpoint block
+
+This initial block is known as the "trusted height" and serves as the foundation for all subsequent verifications. Since this block is assumed to be correct without cryptographic verification within the light client itself, its selection is critical. In production environments, a governance proposal or similar consensus mechanism often validates this block before it's passed to the light client.
+
+Once initialized, the light client can begin verifying new blocks. The update function accepts a block header and associated cryptographic proofs, then:
+
+1. Verifies the header's cryptographic integrity
+1. Validates the consensus signatures from the sync committee
+1. Updates the client's internal state to reflect the new "latest verified block"
+
+Updates can happen in sequence (verifying each block) or can skip intermediate blocks using more complex proof mechanisms. The efficiency of this process is what makes light clients practical for cross-chain communication.
+
+If the light client detects conflicting information or invalid proofs that suggest an attack attempt, it can enter a "frozen" state. This is a safety mechanism that prevents the client from processing potentially fraudulent updates. Recovery from a frozen state typically requires governance intervention.
+
+Since initialization is rather trivial, we will not dive deeper into it.
+
+### Updating
+
+Since Ethereum is finalized by the Beacon Chain, our Ethereum light client accepts beacon block data as update input. A [beacon block](https://eth2book.info/capella/part3/containers/blocks/#beacon-blocks) roughly has this structure:
+
+```python
+class BeaconBlockBody(Container):
+    randao_reveal: BLSSignature
+    eth1_data: Eth1Data
+    graffiti: Bytes32
+    proposer_slashings: List[ProposerSlashing, MAX_PROPOSER_SLASHINGS]
+    attester_slashings: List[AttesterSlashing, MAX_ATTESTER_SLASHINGS]
+    attestations: List[Attestation, MAX_ATTESTATIONS]
+    deposits: List[Deposit, MAX_DEPOSITS]
+    voluntary_exits: List[SignedVoluntaryExit, MAX_VOLUNTARY_EXITS]
+    sync_aggregate: SyncAggregate
+    execution_payload: ExecutionPayload
+    bls_to_execution_changes: List[SignedBLSToExecutionChange, MAX_BLS_TO_EXECUTION_CHANGES]
+```
+
+We are specifically interested in `sync_aggregate`, which is a structure describing the votes of the sync committee:
+
+```python
+class SyncAggregate(Container):
+    sync_committee_bits: Bitvector[SYNC_COMMITTEE_SIZE]
+    sync_committee_signature: BLSSignature
+```
+
+The `sync_committee_bits` indicate which members voted (not all need to vote), and the `sync_committee_signature` is a BLS signature of the members referenced in the bit vector.
+
+BLS signatures (Boneh-Lynn-Shacham) are a type of cryptographic signature scheme that allows multiple signatures to be aggregated into a single signature. This makes them space and compute efficient (you can aggregate hundreds of signatures into one). Just as we aggregate signatures, we can aggregate public keys as well, such that the aggregate public key can verify the aggregated signature.
+
+For our SyncAggregate, computing the aggregate pubkey is simple:
+
+```python
+def _aggregate_pubkeys(committee, bits)
+    pubkeys = []
+    for i, bit in enumerate(bits):
+        if bit:
+            pubkeys.append(committee[i])
+    return bls.Aggregate(pubkeys)
+```
+
+At scale, we can aggregate thousands (if not hundreds of thousands) of signatures and public keys, while only verifying their aggregates.
+
+To our light client, as long as a majority of sync committee members have attested the block, it is considered final.
+
+```python
+class LightClient():
+    def update(self, block: BeaconBlockBody):
+
+        # Count how many committee members signed
+        signature_count = sum(sync_aggregate.sync_committee_bits)
+
+        # Need 2/3+ committee participation for finality
+        if signature_count < (SYNC_COMMITTEE_SIZE * 2) // 3:
+            raise ValueError("Insufficient signatures from sync committee")
+
+        # Construct aggregate public key from the current committee and bit vector
+        aggregate_pubkey = _aggregate_pubkeys(
+            self.current_sync_committee,
+            block.sync_aggregate.sync_committee_bits
+        )
+```
+
+Now we have the `aggregate_pubkey` for the committee, as well as verifying that enough members have signed. Notice that to obtain the sync committee public keys, we used `self.current_sync_committee`. This is set during initialization, and later updated in our `update` function.
+
+Next we have to construct the digest (what has been signed) before we verify the aggregated signature. If we didn't compute the digest ourselves, but obtained it from the block, then the caller could fraudulently pass a correct digest, but have other values in the block altered.
+
+```python
+    signing_root = self._compute_signing_root(block)
+
+    # Verify the aggregated signature against the aggregated public key
+    if not bls.Verify(
+        aggregate_pubkey,
+        signing_root,
+        sync_aggregate.sync_committee_signature
+    ):
+        raise ValueError("Invalid sync committee signature")
+```
+
+Since the signature and block are both valid, we can now trust the contents of the passed beacon block. Next the light client will store data from the block:
+
+```python
+    self.latest_block_root = self._compute_block_root(block)
+    self.latest_slot = block.slot
+```
+
+Finally, we have to update the sync committee. The committee rotates every sync committee period (256 epochs), and thus if this is at the boundary, we have to update these values. Luckily Ethereum makes this easy for us, and provides what the next sync committee will be:
+
+```python
+    if slot % (SLOTS_PER_EPOCH * EPOCHS_PER_SYNC_COMMITTEE_PERIOD) == 0:
+        self.current_sync_committee = self.next_sync_committee
+        self.next_sync_committee = block.next_sync_committee
+```
+
+`SLOTS_PER_EPOCH` and `EPOCHS_PER_SYNC_COMMITTEE_PERIOD` can be hardcoded, or stored in the light client state. Each epoch is 32 slots (approximately 6.4 minutes), so a full sync committee period lasts about 27.3 hours.
+
+With this relatively simple protocol, we now have a (python) smart contract that can track Ethereum's blocks.
+
+### Optimizations
+
+In actuality, the beacon block is still too large for a light client. The actual light client uses the [`LightClientHeader`](https://github.com/unionlabs/union/blob/cfe862e6dacf5474925110504891fa4120e747f6/lib/beacon-api-types/src/deneb/light_client_header.rs#L16C12-L16C29) data structure, which consists of a beacon header and execution header.
+
+The beacon header is used to prove the consensus and transition the internal state, as well as immediately prove that the execution header is valid. The block height in the execution header is then used for further client operations, such as transaction timeouts. Using the execution height instead of the beacon height for timeouts has advantages for users and developers, ensuring they do not even need to be aware of the Beacon Chain's existence.
+
+Another significant optimization relates to signature aggregation. Since the majority of the sync committee always signs, we instead aggregate the public keys of the non-signers, and subtract that from the aggregated total. Effectively, if on average 90% of members sign, we submit the 10% that did not sign. This results in an approximate 80% computational reduction (by avoiding the need to process 90% of the signatures individually), as well as reducing the size of the client update transaction.
diff --git a/src/light-clients/inclusion-proofs.md b/src/light-clients/inclusion-proofs.md
@@ -0,0 +1,211 @@
+# Inclusion Proofs
+
+Now that we understand how to verify blocks using a light client, we will show how based on these verified blocks, we can verify state proofs, and in extension of that, messages and transfers. We will explore how to efficiently prove that a piece of data is included in a larger dataset without needing to reveal or access the entire dataset.
+
+## Merkle Trees
+
+A Merkle tree (or hash tree) is a data structure that allows for efficient and secure verification of content in a large body of data. Named after Ralph Merkle, who patented it in 1979, these trees are fundamental building blocks in distributed systems and cryptographic applications.
+
+### Structure of a Merkle Tree
+
+A Merkle tree is a binary tree where:
+
+- Each leaf node contains the hash of a data block
+- Each non-leaf node contains the hash of the concatenation of its child nodes
+
+Let's visualize a simple Merkle tree with 4 data blocks:
+
+```mermaid
+graph TD
+    Root["Root Hash: H(H1 + H2)"] --> H1["H1: H(H1-1 + H1-2)"]
+    Root --> H2["H2: H(H2-1 + H2-2)"]
+    H1 --> H1-1["H1-1: Hash(Data 1)"]
+    H1 --> H1-2["H1-2: Hash(Data 2)"]
+    H2 --> H2-1["H2-1: Hash(Data 3)"]
+    H2 --> H2-2["H2-2: Hash(Data 4)"]
+```
+
+In this diagram:
+
+1. We start with four data blocks: Data 1, Data 2, Data 3, and Data 4
+1. We compute the hash of each data block to create the leaf nodes (H1-1, H1-2, H2-1, H2-2)
+1. We then pair the leaf nodes and hash their concatenated values to form the next level (H1, H2)
+1. Finally, we concatenate and hash the results to get the root hash
+
+The root hash uniquely represents the entire dataset. If any piece of data in the tree changes, the root hash will also change. Root hashes are present in block headers for most blockchains. Ethereum for example has the`state_root` field in each header. With the state root, we can construct a proof for any data stored on Ethereum, so whenever we write value V to storage in a solidty smart contract at block H, we can construct a proof to show that from H onwards, that slot contains V. This proof will be valid until we update or delete the value.
+
+```python
+def prove(state_root, proof, V) -> Boolean
+```
+
+For Merkle trees specifically, we construct Merkle Inclusion proofs. Constructing the proof is relatively compute intensive and requires access to the full state and history, so only archive nodes are capable of doing so.
+
+## Inclusion Proofs
+
+An inclusion proof (also called a Merkle proof) is a way to verify that a specific data block is part of a Merkle tree without having to reveal the entire tree.
+
+An inclusion proof consists of:
+
+1. The data block to be verified
+1. A "proof path" - a list of hashes that, combined with the data block's hash in the right order, will reproduce the root hash
+
+Let's visualize how a Merkle proof works for Data 2 in our example:
+
+In this visualization, we're proving that Data 2 is included in the tree. The pink node is the data we're proving (Data 2), and the blue nodes represent the proof path hashes we need.
+
+The proof for Data 2 consists of:
+
+- The data itself: Data 2
+- The proof path: \[H1-1, H2\]
+
+```mermaid
+graph TD
+    Root["Root Hash: H(H1 + H2)"] --> H1["H1: H(H1-1 + H1-2)"]
+    Root --> H2["H2: H(H2-1 + H2-2)"]
+    H1 --> H1-1["H1-1: Hash(Data 1)"]
+    H1 --> H1-2["H1-2: Hash(Data 2)"]
+    H2 --> H2-1["H2-1: Hash(Data 3)"]
+    H2 --> H2-2["H2-2: Hash(Data 4)"]
+```
+
+To verify that Data 2 is indeed part of the Merkle tree with root hash R, a verifier would:
+
+1. Compute H1-2 = Hash(Data 2)
+1. Compute H1 = Hash(H1-1 + H1-2) using the provided H1-1
+1. Compute Root = Hash(H1 + H2) using the provided H2
+1. Compare the computed Root with the known Root hash
+
+If they match, it proves Data 2 is part of the tree. Effectively we recompute the state root lazily.
+
+For a Merkle tree with n leaves, the proof size and verification time are both O(log n), making it highly efficient even for very large datasets.
+
+For example, in a tree with 1 million leaf nodes:
+
+- A full tree would require storing 1 million hashes
+- A Merkle proof requires only about 20 hashes (log₂ 1,000,000 ≈ 20)
+
+## Message Verification using Inclusion Proofs
+
+Now that we have a clearly defined model on how to get blocks from chain `source` on chain `destination` using a [light client](./ethereum.md), and how to prove state from `source` on `destination` using `source`'s state root, we will show a simple model on how to securely perform cross chain messaging using a state proof.
+
+First, we need to write our message to a known storage location on the source chain. This is typically done through a smart contract:
+
+```solidity
+// On source chain
+contract MessageSender {
+    // Maps message IDs to actual messages
+    mapping(uint256 => bytes) public messages;
+
+    // Event emitted when a new message is stored
+    event MessageStored(uint256 indexed messageId, bytes message);
+
+    function sendMessage(bytes memory message) public returns (uint256) {
+        uint256 messageId = hash(message)
+        messages[messageId] = message;
+        emit MessageStored(messageId, message);
+        return messageId;
+    }
+}
+```
+
+When `sendMessage` is called, the message is stored in the contract's state at a specific storage slot that can be deterministically calculated from the messageId.
+
+Next, we need to update the light client on the destination chain to reflect the latest state of the source chain:
+
+```solidity
+// On destination chain
+contract LightClient {
+    // Latest verified block header from source chain
+    BlockHeader public latestHeader;
+
+    function updateBlockHeader(BlockHeader memory newHeader, Proof memory proof) public {
+        // Verify the proof that newHeader is a valid successor to latestHeader
+        require(verifyProof(latestHeader, newHeader, proof), "Invalid block proof");
+
+        // Update the latest header
+        latestHeader = newHeader;
+    }
+
+    function getStateRoot() public view returns (bytes32) {
+        return latestHeader.stateRoot;
+    }
+}
+```
+
+The light client maintains a record of the latest verified block header, which includes the state root of the source chain. Regular updates to this light client ensure that the destination chain has access to recent state roots.
+
+Finally, we can prove the existence of the message on the destination chain using a Merkle inclusion proof against the state root:
+
+```solidity
+// On destination chain
+contract MessageReceiver {
+    LightClient public lightClient;
+    address public sourceSenderContract;
+
+    constructor(address _lightClient, address _sourceSender) {
+        lightClient = LightClient(_lightClient);
+        sourceSenderContract = _sourceSender;
+    }
+
+    function verifyAndProcessMessage(
+        uint256 messageId,
+        bytes memory message,
+        bytes32[] memory proofNodes,
+        uint256 proofPath
+    ) public {
+        // Get the latest state root from the light client
+        bytes32 stateRoot = lightClient.getStateRoot();
+
+        // Calculate the storage slot for this message in the source contract
+        bytes32 storageSlot = keccak256(abi.encode(messageId, uint256(1))); // Slot for messages[messageId]
+
+        // Verify the inclusion proof against the state root
+        require(
+            verifyStorageProof(
+                stateRoot,
+                sourceSenderContract,
+                storageSlot,
+                message,
+                proofNodes,
+                proofPath
+            ),
+            "Invalid state proof"
+        );
+
+        // Message is verified, now process it
+        processMessage(messageId, message);
+    }
+
+    function processMessage(uint256 messageId, bytes memory message) internal {
+        // Application-specific message handling
+        // ...
+    }
+
+    function verifyStorageProof(
+        bytes32 stateRoot,
+        address contractAddress,
+        bytes32 slot,
+        bytes memory expectedValue,
+        bytes32[] memory proofNodes,
+        uint256 proofPath
+    ) internal pure returns (bool) {
+        // This function verifies a Merkle-Patricia trie proof
+        // It proves that at the given storage slot in the specified contract,
+        // the value matches expectedValue in the state with stateRoot
+
+        // Implementation details omitted for brevity
+        // This would use the proofNodes and proofPath to reconstruct the path
+        // from the leaf (storage value) to the state root
+
+        return true; // Placeholder
+    }
+}
+```
+
+This mechanism ensures that messages can only be processed on the destination chain if they were genuinely recorded on the source chain, without requiring trust in any intermediaries. The security of the system relies on:
+
+1. The integrity of the light client, which only accepts valid block headers
+1. The cryptographic properties of Merkle trees, which make it impossible to forge inclusion proofs
+1. The immutability of blockchain state, which ensures the message cannot be altered once written
+
+By combining light client verification with state inclusion proofs, we establish a trustless bridge for cross-chain communication that maintains the security properties of both blockchains.