refactor: chunk -> share (celestiaorg#1455)

rootulp · web-flow · commit 227809af973e · 2024-03-05T16:22:56.000-05:00
diff --git a/developers/blobstream-offchain.md b/developers/blobstream-offchain.md
@@ -54,7 +54,7 @@ that data's inclusion via Blobstream if needed. Read more in the
 [namespace specifications](https://celestiaorg.github.io/celestia-app/specs/namespace.html),
 and you can think of this like a chain ID. Learn more
 [information about `shares`](https://celestiaorg.github.io/celestia-app/specs/shares.html),
-which are small chunks of the encoded Celestia block. We use the same encoding
+which are small pieces of the encoded Celestia block. We use the same encoding
 here so that the commitments to the rollup block match those committed to by
 validators in the Celestia data root.
 
diff --git a/learn/how-celestia-works/data-availability-layer.md b/learn/how-celestia-works/data-availability-layer.md
@@ -24,7 +24,7 @@ commitments (_i.e._, Merkle roots) of the block data (_i.e._, the list of trans
 
 To make DAS possible, Celestia uses a 2-dimensional Reed-Solomon
 encoding scheme to encode the block data: every block data is split
-into $k \times k$ chunks, arranged in a $k \times k$ matrix, and extended with parity
+into $k \times k$ shares, arranged in a $k \times k$ matrix, and extended with parity
 data into a $2k \times 2k$ extended matrix by applying multiple
 times Reed-Solomon encoding.
 
@@ -35,19 +35,19 @@ as the block data commitment in the block header.
 ![2D Reed-Soloman (RS) Encoding](/img/learn/reed-solomon-encoding.png)
 
 To verify that the data is available, Celestia light nodes are sampling
-the $2k \times 2k$ data chunks.
+the $2k \times 2k$ data shares.
 
 Every light node randomly chooses a set of unique coordinates in the
-extended matrix and queries full nodes for the data chunks and the
+extended matrix and queries full nodes for the data shares and the
 corresponding Merkle proofs at those coordinates. If light nodes
 receive a valid response for each sampling query, then there is a
 [high probability guarantee](https://github.com/celestiaorg/celestia-node/issues/805#issuecomment-1150081075)
 that the whole block's data is available.
 
-Additionally, every received data chunk with a correct Merkle proof
+Additionally, every received data share with a correct Merkle proof
 is gossiped to the network. As a result, as long as the Celestia light
-nodes are sampling together enough data chunks (_i.e._, at least
-$k \times k$ unique chunks),
+nodes are sampling together enough data shares (_i.e._, at least
+$k \times k$ unique shares),
 the full block can be recovered by honest full nodes.
 
 For more details on DAS, take a look at the [original paper](https://arxiv.org/abs/1809.09044).
@@ -75,9 +75,9 @@ DA layer.
 The requirement of downloading the $4k$ intermediate Merkle roots is a
 consequence of using a 2-dimensional Reed-Solomon encoding scheme. Alternatively,
 DAS could be designed with a standard (_i.e._, 1-dimensional) Reed-Solomon encoding,
-where the original data is split into $k$ chunks and extended with $k$ additional
-chunks of parity data. Since the block data commitment is the Merkle root of the
-$2k$ resulting data chunks, light nodes no longer need to download $O(n)$ bytes to
+where the original data is split into $k$ shares and extended with $k$ additional
+shares of parity data. Since the block data commitment is the Merkle root of the
+$2k$ resulting data shares, light nodes no longer need to download $O(n)$ bytes to
 validate block headers.
 
 The downside of the standard Reed-Solomon encoding is dealing with malicious
@@ -86,7 +86,7 @@ block producers that generate the extended data incorrectly.
 This is possible as **Celestia does not require a majority of the consensus
 (_i.e._, block producers) to be honest to guarantee data availability.**
 Thus, if the extended data is invalid, the original data might not be
-recoverable, even if the light nodes are sampling sufficient unique chunks
+recoverable, even if the light nodes are sampling sufficient unique shares
 (_i.e._, at least $k$ for a standard encoding and $k \times k$ for a
 2-dimensional encoding).
 
@@ -112,20 +112,20 @@ To this end, Celestia is using Namespaced Merkle trees (NMTs).
 An NMT is a Merkle tree with the leafs ordered by the namespace identifiers
 and the hash function modified so that every node in the tree includes the
 range of namespaces of all its descendants. The following figure shows an
-example of an NMT with height three (_i.e._, eight data chunks). The data is
+example of an NMT with height three (_i.e._, eight data shares). The data is
 partitioned into three namespaces.
 
 ![Namespaced Merkle Tree](/img/learn/nmt.png)
 
 When an application requests the data for namespace 2, the DA layer must
-provide the data chunks `D3`, `D4`, `D5`, and `D6` and the nodes `N2`, `N8`
+provide the data shares `D3`, `D4`, `D5`, and `D6` and the nodes `N2`, `N8`
 and `N7` as proof (note that the application already has the root `N14` from
 the block header).
 
 As a result, the application is able to check that the provided data is part
 of the block data. Furthermore, the application can verify that all the data
 for namespace 2 was provided. If the DA layer provides for example only the
-data chunks `D4` and `D5`, it must also provide nodes `N12` and `N11` as proofs.
+data shares `D4` and `D5`, it must also provide nodes `N12` and `N11` as proofs.
 However, the application can identify that the data is incomplete by checking
 the namespace range of the two nodes, _i.e._, both `N12` and `N11` have descendants
 part of namespace 2.
diff --git a/learn/how-celestia-works/overview.md b/learn/how-celestia-works/overview.md
@@ -18,7 +18,7 @@ similar to [reducing consensus to atomic broadcast](https://en.wikipedia.org/wik
 The latter provides an efficient solution to the
 [data availability problem](https://coinmarketcap.com/alexandria/article/what-is-data-availability)
 by only requiring resource-limited light nodes to sample a
-small number of random chunks from each block to verify data availability.
+small number of random shares from each block to verify data availability.
 
 Interestingly, more light nodes that participate in sampling
 increases the amount of data that the network can safely handle,
diff --git a/learn/how-celestia-works/transaction-lifecycle.md b/learn/how-celestia-works/transaction-lifecycle.md
@@ -61,7 +61,7 @@ that serves DAS requests.
 Light nodes connect to a celestia-node in the DA network, listen to
 extended block headers (i.e., the block headers together with the
 relevant DA metadata, such as the $4k$ intermediate Merkle roots), and
-perform DAS on the received headers (i.e., ask for random data chunks).
+perform DAS on the received headers (i.e., ask for random data shares).
 
 Note that although it is recommended, performing DAS is optional -- light
 nodes could just trust that the data corresponding to the commitments in
@@ -70,11 +70,11 @@ In addition, light nodes can also submit transactions to the celestia-app,
 i.e., `PayForBlobs` transactions.
 
 While performing DAS for a block header, every light node queries Celestia
-Nodes for a number of random data chunks from the extended matrix and the
+Nodes for a number of random data shares from the extended matrix and the
 corresponding Merkle proofs. If all the queries are successful, then the
 light node accepts the block header as valid (from a DA perspective).
 
-If at least one of the queries fails (i.e., either the data chunk is not
+If at least one of the queries fails (i.e., either the data share is not
 received or the Merkle proof is invalid), then the light node rejects the
 block header and tries again later. The retrial is necessary to deal with
 false negatives, i.e., block headers being rejected although the block
@@ -87,6 +87,6 @@ then at least one honest full node will eventually have the entire block data)
 is probabilistically guaranteed (for more details, take a look at the
 [original paper](https://arxiv.org/abs/1809.09044)).
 
-By fine tuning Celestia's parameters (e.g., the number of data chunks sampled
+By fine tuning Celestia's parameters (e.g., the number of data shares sampled
 by each light node) the likelihood of false positives can be sufficiently
 reduced such that block producers have no incentive to withhold the block data.