-
Notifications
You must be signed in to change notification settings - Fork 0
sparse blobpool EIP, here we go! #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add new "Local block builders" section defining three blob inclusion policies: conservative, optimistic, and proactive - Refine sampler role: change MUST to SHOULD for waiting on provider announcements, explicitly state samplers can request from providers. - Rewrite supernode behavior to emphasize load balancing and reconstruction priority. - Update provider role to require announcing full availability. - Add "Attack scenarios and threat model" section with placeholders for selective withholding and eclipse attacks - Add new "The need for sampling noise" subsection explaining defense against predictable custody patterns - Add "No normative peer scoring" subsection explaining rationale for avoiding brittle scoring systems - Clarify that hard fork is not strictly required but recommended - Note gradual rollout assessment is ongoing - Rename "Detecting misbehaviour" to "Ensuring fairness" with softer language on peer disconnection policies - Improve doc formatting throughout - Update Engine API parameter descriptions for clarity
| We introduce a new devp2p protocol version, `eth/71` extending the existing `eth/70` protocol (TODO: spec for eth/70 missing). | ||
|
|
||
| **Modify `NewPooledTransactionHashes` (`0x08`) message.** | ||
| Add a new field `cell_mask` of type `B_16` (`uint128`). This field MUST be interpreted as a bitarray of length `CELLS_PER_EXT_BLOB`, carrying `1` in the indices of colums the announcer has available for **all type 3 txs announced within the message**. This field MUST be set to `nil` when no transactions of such type are announced. (TODO: this approach is not expressive enough if the node wants to offer randomly sampled columns, nor if it wants to announce transactions with full and partial availability in a single message). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds a bit strange to say first its B_16 and then say length CELLS_PER_EXT_BLOB. Maybe better the opposite, saying it is currently represented as B_16.
We should also define the encoding, to be on the safe side (which bit is column i?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the granularity, options:
- 1 bitmask per message, as above: most concise, seemingly least expressive. But if we fall back to one message per type 3 transaction, we get the same expressiveness as the next option. I seem to remember that in my testing type 3 announcements were single messages already in most cases(TBC).
- 1 bitmask per transaction: overhead is clear: hash is B_32, now we add a B_16. 50% extra, but still beareable. The question is what we gain, given the fallback option above.
- 1 bitmask per blob: this would be the "ultimate" granularity, but I don't think we need it now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had expanded on this in the Rationale section, and ended up concluding that what we have right now isn't elegant, but it's fine from a practical perspective. Take a look and if you agree, I can remove the the TODO here. We can also expand further on the various options / tradeoffs, but I don't think it's super necessary.
| - new schema (`eth/71`): `[types: B, [size_0: P, size_1: P, ...], [hash_0: B_32, hash_1: B_32, ...], cell_mask: B_16]` | ||
|
|
||
| **Modify `GetPooledTransactions` (`0x09`) / `PooledTransactions` (`0x10`) behaviour.** | ||
| Responses now elide blob payloads for type 3 transactions requested via this RPC. This is performed by setting an RLP `nil` literal in the list position corresponding to the transaction's blob data. Cell proofs and commitments are unaffected and continue to be sent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"request" instead of "RPC"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, happy to change it, but my intention was to refer to the interaction as a whole, more than to a concrete leg of it. Does "RPC" sound foreign to devp2p experts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no RPC mentioned in the devp2p spec till now. These protocol messages happen to follow a Request-Response pattern like you need for RPC too, but it's not RPC.
| **New message type `Cells` (`0x13`).** Used to respond to a `GetCells` requests. | ||
|
|
||
| TODO: check if this is how it is being implemented now | ||
| - `eth/71`: `[[hash_0: B_32, hash_1: B_32, ...], cells: [[cell_0_0: B_2048, cell_0_1: B_2048, ...], [cell_1_0: B_2048, cell_1_1: B_2048, ...]], cell_mask: B_16]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since both cell size and and bitmask size depend on the number of columns, maybe better to have type definitions somewhere, defining one as B_2048, the other as B_16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm just borrowing the notation I've seen in devp2p docs. I don't recall type aliases being used and preferred to avoid bikeshedding over this :D
|
✨ ethereum/EIPs#10650 [EIP] |
| - Extend the `PooledTransactions` message schema to support cell-level returns | ||
| - Introduce eth/70 protocol version | ||
| **New transaction hash.** | ||
| Upon receiving a `NewPooledTransactionHashes` announcement containing a previously unknown type 3 transaction hash, the node makes a probabilistic decision about fetching the blob payload: it decides to fetch the full blob payload with probability p = 0.15 (provider role), or simply sample otherwise (sampler role). This decision MAY be remembered for some period chosen by the implementer, in which case stateless heuristics are RECOMMENDED (e.g. calculating a hash by appending some time-bound value mixed in with stable transaction properties, and applying p to it). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since implementations receive NewPooledTransactionHashes for the same tx from several peers, and we don't want them to waste bandwidth by downloading from every single one, they do have to be stateful already. Here, we have another reason, as re-rolling the dice would basically multiply the probability of becoming a provider. So I think MAY is not enough. SHOULd or even MUST would be better.
|
|
||
| ### Execution Client Behavior | ||
| **Provider role.** | ||
| If the node is a provider, and the announcing peer signaled full availability, the node MUST request the signed transaction and full blob data from that peer via `GetPooledTransactions` and `GetCells` with an all-ones `cell_mask` bitmap. Upon successful retrieval and validation (as per [EIP-7594](./eip-7594.md)), the node MUST in turn announce the transaction hash to its peers via `NewPooledTransactionHashes`, also with an all-ones `cell_mask`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense that we encode all 1s, that happens in 15% of the cases, as a B_16? We should somhow shorten this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it would be trivial to use a union type (reified a simple byte string in RLP). 0xFF (1 byte) to denote full, or B_16 to denote a bitmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're just shaving off 15 bytes though, not sure if the type complexity is worth it. That's why I didn't push on it, but happy to go with what EL impls prefer!
|
|
||
| **New transaction hash.** Upon receiving a `NewPooledTransactionHashes` announcement containing a previously unknown type 3 transaction hash, the node makes a probabilistic decision about fetching the blob payload: it decides to fetch the full blob payload with probability p = 0.15 (provider role), or simply sample them otherwise (sampler role). | ||
| **Sampler role.** | ||
| If the node is a sampler, the node MUST only request the transaction payload via `GetPooledTransactions`. It SHOULD await to observe at least 2 distinct provider announcements, and to successfully receive and validate the signed transaction, before acting further. It then MUST request custody-aligned cells from peers that announced overlapping availability, including providers. When fetching from a provider, the node MUST request `C_extra` random columns in addition to its custody set (see "Sampling noise" in Rationale). The node MUST request no more than `C_req` columns per request, at all times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here for sampler nodes, we could cadd a vhash-based logic to make RBF cheap.
It is a bit more complicated for providers, where it would mean we need to do the GetPooledTransactions first, and the GetCells only after receiving it, increasing diffusion from 1 RTT, to 2 RTT, even if it is not RBF. We can work around this, however. If we add an RBF flag to NewPooledTransactionHashes, the receiver would know whether it can use fast fetch(GetPooledTransactions + GetCells sent together), or slow fetch ( GetCells only sent after GetPooledTransactions response received).
|
|
||
| **Sampler role.** If the node is a sampler, it MUST request the cells corresponding to its custody columns from peers that announced overlapping availability, but only after it observes at least 2 distinct provider announcements. The node MUST only request a maximum of `C_req` columns per request. When fetching from a provider, the node MUST request `C_extra` random columns in addition to its custody set. | ||
| **Supernode behaviour.** | ||
| Supernodes (nodes intending to fetch every blob payload in full) MUST load balance requests across samplers and providers. Furthermore, supernodes SHOULD prioritize reconstructing blobs and proofs from 64 columns. Supernodes SHOULD maintain a larger peerset in order to satisfy their increased blob fetching needs without over-stressing a small set of neighbours and violating fairness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced forcing load balancing between samplers and providers is a good idea. Samplers are slow, so this would slow down supernodes as well.
|
|
||
| **Supernodes.** Supernodes will naturally want to fetch full blob payloads for every transaction. They MUST NOT act like permanent providers, to avoid getting banned due to their fetch rate exceeding expectations. Instead, they MUST respect p = 0.15 and MUST maximally load balance fetching columns from multiple peers. They can remain connected with a larger peerset to satisfy their increased sampling needs. | ||
| **Continued sampling** | ||
| Tenured transactions MAY be subject to resampling in other to test for liveness and confirm confidence of continued network-wide availability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"in other to" -> "in order to"
We should also explain the rationale, and also what we mean by tenured: it might happen that a successfully sampled transaction is not getting sufficent diffusion in the network, or it is getting evicted by other nodes. To avoid filling a node's transaction pool with such transaction, it is advisable for nodes to keep track of announcements received after sampling. A node MAY also resample a transaction in order to for liveness and confirm confidence of continued network-wide availability.
| 2. The Execution client MUST return an Ok response if the request is well-formed. All subsequent sampling requests MUST adopt the new custody set. Queued sampling requests MAY be patched to reflect the new custody set. | ||
| 3. For type 3 transactions pending in the blobpool: | ||
| 1. If the custody set has expanded, the Execution client MUST issue new sampling requests for the delta. It SHOULD broadcast updated `NewPooledTransactionHashes` announcement with the new available set. | ||
| 2. If the custody set has contracted, the Execution client MAY prune dropped cells from local storage, but only AFTER it has broadcast an updated `NewPooledTransactionHashes` announcement with the reduced available set. This is to avoid peers from perceiving an availability fault if they happen to request those previously announced cells. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can be more explicit to avoid too simple implementations:
AFTER broadcasting an updated NewPooledTransactionHashes announcement with the reduced available set and waiting for eventual in-flight requests.
(+ commit log)