|
| 1 | +# Codelab |
| 2 | + |
| 3 | +This codelab will walk you trough all the steps required to build a Tiled tree. |
| 4 | + |
| 5 | +The Tiled tree will be stored on disk using the layout described in the [layout |
| 6 | +directory](api/layout/README.md). Its checkpoint uses the [checkpoint format](https://github.com/transparency-dev/formats/blob/main/log/README.md#checkpoint-format). |
| 7 | + |
| 8 | +## Prelimiary setup |
| 9 | + |
| 10 | +The command-line tools in thi repository can generate tile based logs from leaf |
| 11 | +data stored on your file system. Each file will correspond to a single leaf in |
| 12 | +the tree. |
| 13 | + |
| 14 | +Before we start, let's define a few environment variables: |
| 15 | + |
| 16 | +```bash |
| 17 | +export DATA_DIR="/tmp/myfiles" # where we'll store input data for the tree |
| 18 | +export LOG_DIR="/tmp/mylog" # where the tree will be stored |
| 19 | +export LOG_ORIGIN="My Log" # the origin of the log used by the Checkpoint format |
| 20 | +``` |
| 21 | + |
| 22 | +Checkpoints are signed, and we need a public/private key pair for this. |
| 23 | + |
| 24 | +Use the `generate_keys` command with `--key_name`, a name |
| 25 | +for the signing entity. You can output the public and private keys to files using |
| 26 | +`--out_pub` path and filename for the public key, |
| 27 | +`--out_priv` path and filename for the private key |
| 28 | +and stdout, private key, then public key, over 2 lines, using `--print` |
| 29 | + |
| 30 | +```bash |
| 31 | +go run ./cmd/generate_keys --key_name=astra --out_pub=key.pub --out_priv=key |
| 32 | +``` |
| 33 | + |
| 34 | +### Creating a new log |
| 35 | + |
| 36 | +To create a new log state directory, use the `integrate` command with the `--initialise` |
| 37 | +flag, and either passing key files or with environment variables set: |
| 38 | + |
| 39 | +```bash |
| 40 | +go run ./cmd/integrate --initialise --storage_dir="${LOG_DIR}" --logtostderr --public_key=key.pub --private_key=key --origin="${LOG_ORIGIN}" |
| 41 | +``` |
| 42 | + |
| 43 | +After running this command, the log state directory looks like this: |
| 44 | + |
| 45 | +``` |
| 46 | +$ tree /tmp/mylog/ |
| 47 | +/tmp/mylog/ |
| 48 | +├── checkpoint |
| 49 | +├── leaves |
| 50 | +│ └── pending |
| 51 | +├── seq |
| 52 | +└── tile |
| 53 | +
|
| 54 | +5 directories, 1 file |
| 55 | +``` |
| 56 | + |
| 57 | +See the [layout](api/layout/README.md) documentation for an explanation of what each directory is for. |
| 58 | + |
| 59 | +Let's look at the checkpoint content: |
| 60 | + |
| 61 | +``` |
| 62 | +$ cat /tmp/mylog/checkpoint |
| 63 | +My Log |
| 64 | +0 |
| 65 | +47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU= |
| 66 | +
|
| 67 | +— astra PlUh/n54e2dSIKi6kHjea5emrGnmC7lJVDgnIfWGIJmgFqp22k0UlnUk97L2ViqrFm986NwV+wJYGnrtRPJTBV0GrA0= |
| 68 | +``` |
| 69 | + |
| 70 | +- `My Log` is the origin from above. |
| 71 | +- `0` is the number of leaves in the tree, which currently is 0 |
| 72 | +- `47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=` is the [hash of an empty slice of bytes](https://go.dev/play/p/imi_2TM6DyI), since the log is empty. |
| 73 | +- The last line is a signature over this data, using the astra private key we've generated above |
| 74 | + |
| 75 | + |
| 76 | +### Creating log content |
| 77 | +Now let's add some leaves to the log. |
| 78 | + |
| 79 | +Firt, we generate the input data with: |
| 80 | +```bash |
| 81 | +$ mkdir $DATA_DIR |
| 82 | +$ for i in $(seq 0 3); do x=$(printf "%03d" $i); echo "leaf_data_$x" > /tmp/files/leaf_$x; done; |
| 83 | +``` |
| 84 | + |
| 85 | +To add the contents of some files to a log, use the `sequence` command with the |
| 86 | +`--entries` flag set to a filename glob of files to add and either passing the public key |
| 87 | +file or with the environment variable set: |
| 88 | + |
| 89 | +```bash |
| 90 | +$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries '/tmp/files/*' --public_key=key.pub --origin="${LOG_ORIGIN}" |
| 91 | +I1221 13:16:23.940255 923589 main.go:131] 0: /tmp/files/leaf_000 |
| 92 | +I1221 13:16:23.940806 923589 main.go:131] 1: /tmp/files/leaf_001 |
| 93 | +I1221 13:16:23.941218 923589 main.go:131] 2: /tmp/files/leaf_002 |
| 94 | +I1221 13:16:23.941673 923589 main.go:131] 3: /tmp/files/leaf_003 |
| 95 | +``` |
| 96 | + |
| 97 | +The `sequence` commands stores data in the log directory using convenient |
| 98 | +formats. The `leaves` directory contains the leaf index of each leaf hash. |
| 99 | +Let's take the leaf at index `0`, which happens to contain `leaf_data_0`. |
| 100 | +This tree uses RFC6962's default hasher, where `leaf_hash = sha256(0x + leaf_data)`. |
| 101 | +`8592d6f366d9d1297f44034d649b68afcee74050aa7a55c769130b2f07ecc65d`, the path for |
| 102 | +the leaf at index 0 with forward slashes removed is the [hexadecimal representation |
| 103 | +of this hash](https://go.dev/play/p/POnCQ7IXayk). |
| 104 | + |
| 105 | +``` |
| 106 | +$ grep -RH '^' /tmp/mylog/ |
| 107 | +/tmp/mylog/checkpoint:My Log |
| 108 | +/tmp/mylog/checkpoint:0 |
| 109 | +/tmp/mylog/checkpoint:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU= |
| 110 | +/tmp/mylog/checkpoint: |
| 111 | +/tmp/mylog/checkpoint:— astra h5lA3N6MJnmnD1dPLqxeoWbbPAc0XPKuqomvSZPiVNLkdJmPDvF+7BkMIr4KBynVgo/ipGbNijHxdbvTZ4zKVXbyLwU= |
| 112 | +/tmp/mylog/leaves/6c/b0/b1/a3c33114cec1d940b9a6c48b55fb2c73f6efcfd53aeef2644681c9b70a:2 |
| 113 | +/tmp/mylog/leaves/b8/71/4f/045c7d5d0201b06004e6939d944a981605c5fcfa5d3353a3084303d4ad:1 |
| 114 | +/tmp/mylog/leaves/85/92/d6/f366d9d1297f44034d649b68afcee74050aa7a55c769130b2f07ecc65d:0 |
| 115 | +/tmp/mylog/leaves/e0/7c/75/881e1ec1bcad5e45c5cc3d8e2c83cda817a48324514309267ee32ef115:3 |
| 116 | +/tmp/mylog/seq/00/00/00/00/02:leaf_data_002 |
| 117 | +/tmp/mylog/seq/00/00/00/00/00:leaf_data_000 |
| 118 | +/tmp/mylog/seq/00/00/00/00/01:leaf_data_001 |
| 119 | +/tmp/mylog/seq/00/00/00/00/03:leaf_data_003 |
| 120 | +``` |
| 121 | + |
| 122 | +Note that at this point, no internal node of the tree has been computed, and neither |
| 123 | +has the checkpoint been updated. Leaves have only been assigned with a position |
| 124 | +in the log. |
| 125 | + |
| 126 | +Attempting to re-sequence the same file contents will result in the `sequence` |
| 127 | +tool telling you that you're trying to add duplicate entries, along with their |
| 128 | +originally assigned sequence numbers: |
| 129 | + |
| 130 | +```bash |
| 131 | +$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries '/tmp/files/*' --public_key=key.pub --origin="${LOG_ORIGIN}" |
| 132 | +I1221 13:18:59.735244 924268 main.go:131] 0: /tmp/files/leaf_000 (dupe) |
| 133 | +I1221 13:18:59.735362 924268 main.go:131] 1: /tmp/files/leaf_001 (dupe) |
| 134 | +I1221 13:18:59.735406 924268 main.go:131] 2: /tmp/files/leaf_002 (dupe) |
| 135 | +I1221 13:18:59.735447 924268 main.go:131] 3: /tmp/files/leaf_003 (dupe) |
| 136 | +``` |
| 137 | + |
| 138 | +### Integrating sequenced entries |
| 139 | + |
| 140 | +We still need to update the rest of the tree structure to integrate these new entries. |
| 141 | +We use the `integrate` tool for that, again either passing key files or with the |
| 142 | +environment variables set: |
| 143 | + |
| 144 | +```bash |
| 145 | +$ go run ./cmd/integrate --storage_dir="${LOG_DIR}" --public_key=key.pub --private_key=key --origin="${LOG_ORIGIN}" |
| 146 | +I1221 13:19:20.190193 924589 integrate.go:94] Loaded state with roothash |
| 147 | +I1221 13:19:20.190432 924589 integrate.go:132] New log state: size 0x4 hash: 0c2e71ac054d92d58b0efd3013d0df235245331f0c0e828bab62a8fe62460c7f |
| 148 | +``` |
| 149 | + |
| 150 | +This output says that the integration was successful, and we now have a new log |
| 151 | +tree state which contains `0x08` entries, and has the printed log root hash. |
| 152 | + |
| 153 | +Let's look at the contents of the tree directory: |
| 154 | + |
| 155 | +```bash |
| 156 | +$ grep -RH '^' /tmp/mylog/ |
| 157 | +/tmp/mylog/tile/00/0000/00/00/00.04:32 |
| 158 | +/tmp/mylog/tile/00/0000/00/00/00.04:4 |
| 159 | +/tmp/mylog/tile/00/0000/00/00/00.04:hZLW82bZ0Sl/RANNZJtor87nQFCqelXHaRMLLwfsxl0= |
| 160 | +/tmp/mylog/tile/00/0000/00/00/00.04:McF1R3nScwEJFHQpESACDl9SOdg9uTRLVZaDHzLckI0= |
| 161 | +/tmp/mylog/tile/00/0000/00/00/00.04:uHFPBFx9XQIBsGAE5pOdlEqYFgXF/PpdM1OjCEMD1K0= |
| 162 | +/tmp/mylog/tile/00/0000/00/00/00.04:DC5xrAVNktWLDv0wE9DfI1JFMx8MDoKLq2Ko/mJGDH8= |
| 163 | +/tmp/mylog/tile/00/0000/00/00/00.04:bLCxo8MxFM7B2UC5psSLVfssc/bvz9U67vJkRoHJtwo= |
| 164 | +/tmp/mylog/tile/00/0000/00/00/00.04:jNfnGF6uHUDupKFIaPW/QjZnPkINVKkVYc7cBakvPy4= |
| 165 | +/tmp/mylog/tile/00/0000/00/00/00.04:4Hx1iB4ewbytXkXFzD2OLIPNqBekgyRRQwkmfuMu8RU= |
| 166 | +/tmp/mylog/checkpoint:My Log |
| 167 | +/tmp/mylog/checkpoint:4 |
| 168 | +/tmp/mylog/checkpoint:DC5xrAVNktWLDv0wE9DfI1JFMx8MDoKLq2Ko/mJGDH8= |
| 169 | +/tmp/mylog/checkpoint: |
| 170 | +/tmp/mylog/checkpoint:— astra h5lA3GOB547TCfoNMEXxENGJVWmpG6Ynk8C6Oaef5gaFotSVLX9isWdvjnhBek94Is9yVPzIvjQTADF/dk2MhHXiCAY= |
| 171 | +/tmp/mylog/leaves/6c/b0/b1/a3c33114cec1d940b9a6c48b55fb2c73f6efcfd53aeef2644681c9b70a:2 |
| 172 | +/tmp/mylog/leaves/b8/71/4f/045c7d5d0201b06004e6939d944a981605c5fcfa5d3353a3084303d4ad:1 |
| 173 | +/tmp/mylog/leaves/85/92/d6/f366d9d1297f44034d649b68afcee74050aa7a55c769130b2f07ecc65d:0 |
| 174 | +/tmp/mylog/leaves/e0/7c/75/881e1ec1bcad5e45c5cc3d8e2c83cda817a48324514309267ee32ef115:3 |
| 175 | +/tmp/mylog/seq/00/00/00/00/02:leaf_data_002 |
| 176 | +/tmp/mylog/seq/00/00/00/00/00:leaf_data_000 |
| 177 | +/tmp/mylog/seq/00/00/00/00/01:leaf_data_001 |
| 178 | +/tmp/mylog/seq/00/00/00/00/03:leaf_data_003 |
| 179 | +``` |
| 180 | + |
| 181 | +The tile directory has been populated with a file, and the checkpoint has been updated. |
| 182 | +The `leaves/` and `seq/` directories have not changed. |
| 183 | + |
| 184 | +Each tile can store a maximum of 256 leaf hashes. Since we only have 4 for now, they |
| 185 | +fit in a single file. Since it's the first tile of the tree, [its path is 00/0000/00/00/00](api/layout#tile) |
| 186 | +Until the tile is filed with 256 leaves, the tile is "partial", |
| 187 | +that's what the `00.04` notation means: tile `00/0000/00/00/00.04` is the partial |
| 188 | +`00/0000/00/00/00` tile with 4 leaf hashes. |
| 189 | + |
| 190 | +Let's look at each line in the files: |
| 191 | + - `32` that's the number of bytes used for hashes |
| 192 | + - `4` the number of leaf hashes in this tile |
| 193 | + - series of hashes representing the leaf hashes of the tile, and the compact range they |
| 194 | + cover |
| 195 | + |
| 196 | +``` |
| 197 | + b |
| 198 | + / \ |
| 199 | + / \ |
| 200 | + / \ |
| 201 | + a c |
| 202 | + / \ / \ |
| 203 | + h0 h1 h2 h3 |
| 204 | + | | | | |
| 205 | + 0 1 2 3 |
| 206 | +
|
| 207 | +``` |
| 208 | + |
| 209 | +We can spot the [leaves and internal node hashes](https://go.dev/play/p/6guNHqpr388) in the infix tree-traversal order. |
| 210 | + |
| 211 | +```bash |
| 212 | +$ cat /tmp/mylog/tile/00/0000/00/00/00.04 |
| 213 | +32 |
| 214 | +4 |
| 215 | +hZLW82bZ0Sl/RANNZJtor87nQFCqelXHaRMLLwfsxl0= <-- h0 = sha256(0x0 + leaf_data_0) |
| 216 | +McF1R3nScwEJFHQpESACDl9SOdg9uTRLVZaDHzLckI0= <-- a = sha256(0x1 + h0 + h1) |
| 217 | +uHFPBFx9XQIBsGAE5pOdlEqYFgXF/PpdM1OjCEMD1K0= <-- h1 = sha256(0x0 + leaf_data_1) |
| 218 | +DC5xrAVNktWLDv0wE9DfI1JFMx8MDoKLq2Ko/mJGDH8= <-- b = sha256(0x1 + a + c) |
| 219 | +bLCxo8MxFM7B2UC5psSLVfssc/bvz9U67vJkRoHJtwo= <-- h2 = sha256(0x0 + leaf_data_2) |
| 220 | +jNfnGF6uHUDupKFIaPW/QjZnPkINVKkVYc7cBakvPy4= <-- c = sha(0x1 + h2 + h3) |
| 221 | +4Hx1iB4ewbytXkXFzD2OLIPNqBekgyRRQwkmfuMu8RU= <-- h3 = sha256(0x0 + leaf_data_3) |
| 222 | +``` |
| 223 | + |
| 224 | +### Adding one more leaf |
| 225 | +```bash |
| 226 | +$ echo "leaf_data_004" > /tmp/files/leaf_004 |
| 227 | + |
| 228 | +$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries '/tmp/files/leaf_004' --public_key=key.pub --origin="${LOG_ORIGIN}" |
| 229 | +I1221 13:23:43.956356 926120 main.go:131] 4: /tmp/files/leaf_004 |
| 230 | + |
| 231 | +$ go run ./cmd/integrate --storage_dir="${LOG_DIR}" --public_key=key.pub --private_key=key --origin="${LOG_ORIGIN}" |
| 232 | +I1221 13:24:11.168864 926446 integrate.go:94] Loaded state with roothash 0c2e71ac054d92d58b0efd3013d0df235245331f0c0e828bab62a8fe62460c7f |
| 233 | +I1221 13:24:11.169036 926446 integrate.go:132] New log state: size 0x5 hash: 1b26238e581181883c3f51827c58fe9c9e8a4d39383cbbabaabe0662b3c11496 |
| 234 | +``` |
| 235 | + |
| 236 | +This adds matchin files in `seq`, `leaves`, and updates the checkcpoint, as expected. |
| 237 | +A new tile is availble under `00/0000/00/00/00/00.05`: |
| 238 | + |
| 239 | +```bash |
| 240 | +$ tree /tmp/mylog/tile |
| 241 | +└── 00 |
| 242 | + └── 0000 |
| 243 | + └── 00 |
| 244 | + └── 00 |
| 245 | + ├── 00.04 |
| 246 | + └── 00.05 |
| 247 | + |
| 248 | +5 directories, 2 files |
| 249 | +``` |
| 250 | + |
| 251 | +Notice that the old tile, `00.04` has not been deleted. |
| 252 | + |
| 253 | +Here's the diff between the two leaves: |
| 254 | + |
| 255 | +```bash |
| 256 | +$ diff /tmp/mylog/tile/00/0000/00/00/00.04 /tmp/mylog/tile/00/0000/00/00/00.05 |
| 257 | +2c2 |
| 258 | +< 4 |
| 259 | +--- |
| 260 | +> 5 |
| 261 | +9a10,11 |
| 262 | +> |
| 263 | +> 6KUzDe4gX/0rZTZCgfgBtaIGOBkOQz4duxjTT+NeM5w= |
| 264 | +``` |
| 265 | + |
| 266 | +The number of leaves `4` has been updated to `5`, and a new leaf node hash has appeared. |
| 267 | +Note that even though the tree has changed shape to include this new leaf, no internal |
| 268 | +node was added to the tile. That's because tiles only store non-emphemeral node, and in this |
| 269 | +case, all the new interanl nodes are ephemeral: they will change when new leaves are added to |
| 270 | +the tree. |
| 271 | + |
| 272 | +### Filling up the tile |
| 273 | +Let's fill up the tile, with 256 entries: |
| 274 | + |
| 275 | +```bash |
| 276 | +$ for i in $(seq 5 255); do x=$(printf "%03d" $i); echo "leaf_data_$x" > /tmp/files/leaf_$x; done; |
| 277 | + |
| 278 | +$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries '/tmp/files/*' --public_key=key.pub --origin="${LOG_ORIGIN}" |
| 279 | +I1221 13:26:19.752225 927458 main.go:131] 0: /tmp/files/leaf_000 (dupe) |
| 280 | +I1221 13:26:19.752350 927458 main.go:131] 1: /tmp/files/leaf_001 (dupe) |
| 281 | +I1221 13:26:19.752398 927458 main.go:131] 2: /tmp/files/leaf_002 (dupe) |
| 282 | +I1221 13:26:19.752442 927458 main.go:131] 3: /tmp/files/leaf_003 (dupe) |
| 283 | +I1221 13:26:19.752499 927458 main.go:131] 4: /tmp/files/leaf_004 (dupe) |
| 284 | +I1221 13:26:19.752859 927458 main.go:131] 5: /tmp/files/leaf_005 |
| 285 | +I1221 13:26:19.753301 927458 main.go:131] 6: /tmp/files/leaf_006 |
| 286 | +... |
| 287 | + |
| 288 | +$ go run ./cmd/integrate --storage_dir="${LOG_DIR}" --public_key=key.pub --private_key=key --origin="${LOG_ORIGIN}" |
| 289 | +I1221 13:26:22.243568 927696 integrate.go:94] Loaded state with roothash 1b26238e581181883c3f51827c58fe9c9e8a4d39383cbbabaabe0662b3c11496 |
| 290 | +I1221 13:26:22.250694 927696 integrate.go:132] New log state: size 0x100 hash: dc0d01251026e7138412adf1009ef9ed0fc55e2b9a954438b5762deb8e8519c5 |
| 291 | +``` |
| 292 | + |
| 293 | +You can check that the `seq` and `leaves` have been updated with new entries, and so has the checkpoint. |
| 294 | + |
| 295 | +The `tile` directory now looks like this: |
| 296 | + |
| 297 | +```bash |
| 298 | +$ tree /tmp/mylog/tile |
| 299 | +/tmp/mylog/tile |
| 300 | +├── 00 |
| 301 | +│ └── 0000 |
| 302 | +│ └── 00 |
| 303 | +│ └── 00 |
| 304 | +│ ├── 00 |
| 305 | +│ ├── 00.04 -> /tmp/mylog/tile/00/0000/00/00/00 |
| 306 | +│ └── 00.05 -> /tmp/mylog/tile/00/0000/00/00/00 |
| 307 | +└── 01 |
| 308 | + └── 0000 |
| 309 | + └── 00 |
| 310 | + └── 00 |
| 311 | + └── 00.01 |
| 312 | + |
| 313 | +9 directories, 4 files |
| 314 | +``` |
| 315 | + |
| 316 | +Since the `00/0000/00/00/00` tile is now full, its partial version have been deleted, and now |
| 317 | +point to the full tile. |
| 318 | + |
| 319 | +A new tile has also appeared, one stratum above. `01/0000/00/00/00.01`. It contains a single |
| 320 | +node, which is the current root node of the tree. To avoid storing duplicate hashes, this |
| 321 | +top level node of the `00/0000/00/00/00` tile has been stripped, and you'll find an |
| 322 | +empty line in this file: |
| 323 | +``` |
| 324 | +$ cat /tmp/mylog/tile/00/0000/00/00/00 |
| 325 | +... |
| 326 | +ZkeKg5PJFHO3e+TRuTVf4QL7tk9C9NCBkR82ipcsUxw= |
| 327 | +iTG/pTVoZUjBJTfXcdNv2oJjxLQRKUqMOC6zVZoBznk= |
| 328 | +R0G/vzOBrC0IdaP092TEzFn4ksrZB77kIlcAK11J7aw= |
| 329 | +
|
| 330 | +SIeXDZcyctFVLLjX3BqTs4SirwpzCezE6yZRq9OIKHw= |
| 331 | +O876VfSKWrJ5MOQrmnO0jVgqs+vonzE/iC1t681gnAA= |
| 332 | +YDrvejyQgwwCB0u+vwiVml4eRbc5CSaJ0rWsieOtRb4= |
| 333 | +... |
| 334 | +``` |
0 commit comments