Skip to content

Commit d5ca4d8

Browse files
committed
Tutorial walking through tree growth
1 parent 149d17d commit d5ca4d8

File tree

1 file changed

+334
-0
lines changed

1 file changed

+334
-0
lines changed

codelab.md

Lines changed: 334 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,334 @@
1+
# Codelab
2+
3+
This codelab will walk you trough all the steps required to build a Tiled tree.
4+
5+
The Tiled tree will be stored on disk using the layout described in the [layout
6+
directory](api/layout/README.md). Its checkpoint uses the [checkpoint format](https://github.com/transparency-dev/formats/blob/main/log/README.md#checkpoint-format).
7+
8+
## Prelimiary setup
9+
10+
The command-line tools in thi repository can generate tile based logs from leaf
11+
data stored on your file system. Each file will correspond to a single leaf in
12+
the tree.
13+
14+
Before we start, let's define a few environment variables:
15+
16+
```bash
17+
export DATA_DIR="/tmp/myfiles" # where we'll store input data for the tree
18+
export LOG_DIR="/tmp/mylog" # where the tree will be stored
19+
export LOG_ORIGIN="My Log" # the origin of the log used by the Checkpoint format
20+
```
21+
22+
Checkpoints are signed, and we need a public/private key pair for this.
23+
24+
Use the `generate_keys` command with `--key_name`, a name
25+
for the signing entity. You can output the public and private keys to files using
26+
`--out_pub` path and filename for the public key,
27+
`--out_priv` path and filename for the private key
28+
and stdout, private key, then public key, over 2 lines, using `--print`
29+
30+
```bash
31+
go run ./cmd/generate_keys --key_name=astra --out_pub=key.pub --out_priv=key
32+
```
33+
34+
### Creating a new log
35+
36+
To create a new log state directory, use the `integrate` command with the `--initialise`
37+
flag, and either passing key files or with environment variables set:
38+
39+
```bash
40+
go run ./cmd/integrate --initialise --storage_dir="${LOG_DIR}" --logtostderr --public_key=key.pub --private_key=key --origin="${LOG_ORIGIN}"
41+
```
42+
43+
After running this command, the log state directory looks like this:
44+
45+
```
46+
$ tree /tmp/mylog/
47+
/tmp/mylog/
48+
├── checkpoint
49+
├── leaves
50+
│   └── pending
51+
├── seq
52+
└── tile
53+
54+
5 directories, 1 file
55+
```
56+
57+
See the [layout](api/layout/README.md) documentation for an explanation of what each directory is for.
58+
59+
Let's look at the checkpoint content:
60+
61+
```
62+
$ cat /tmp/mylog/checkpoint
63+
My Log
64+
0
65+
47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=
66+
67+
— astra PlUh/n54e2dSIKi6kHjea5emrGnmC7lJVDgnIfWGIJmgFqp22k0UlnUk97L2ViqrFm986NwV+wJYGnrtRPJTBV0GrA0=
68+
```
69+
70+
- `My Log` is the origin from above.
71+
- `0` is the number of leaves in the tree, which currently is 0
72+
- `47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=` is the [hash of an empty slice of bytes](https://go.dev/play/p/imi_2TM6DyI), since the log is empty.
73+
- The last line is a signature over this data, using the astra private key we've generated above
74+
75+
76+
### Creating log content
77+
Now let's add some leaves to the log.
78+
79+
Firt, we generate the input data with:
80+
```bash
81+
$ mkdir $DATA_DIR
82+
$ for i in $(seq 0 3); do x=$(printf "%03d" $i); echo "leaf_data_$x" > /tmp/files/leaf_$x; done;
83+
```
84+
85+
To add the contents of some files to a log, use the `sequence` command with the
86+
`--entries` flag set to a filename glob of files to add and either passing the public key
87+
file or with the environment variable set:
88+
89+
```bash
90+
$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries '/tmp/files/*' --public_key=key.pub --origin="${LOG_ORIGIN}"
91+
I1221 13:16:23.940255 923589 main.go:131] 0: /tmp/files/leaf_000
92+
I1221 13:16:23.940806 923589 main.go:131] 1: /tmp/files/leaf_001
93+
I1221 13:16:23.941218 923589 main.go:131] 2: /tmp/files/leaf_002
94+
I1221 13:16:23.941673 923589 main.go:131] 3: /tmp/files/leaf_003
95+
```
96+
97+
The `sequence` commands stores data in the log directory using convenient
98+
formats. The `leaves` directory contains the leaf index of each leaf hash.
99+
Let's take the leaf at index `0`, which happens to contain `leaf_data_0`.
100+
This tree uses RFC6962's default hasher, where `leaf_hash = sha256(0x + leaf_data)`.
101+
`8592d6f366d9d1297f44034d649b68afcee74050aa7a55c769130b2f07ecc65d`, the path for
102+
the leaf at index 0 with forward slashes removed is the [hexadecimal representation
103+
of this hash](https://go.dev/play/p/POnCQ7IXayk).
104+
105+
```
106+
$ grep -RH '^' /tmp/mylog/
107+
/tmp/mylog/checkpoint:My Log
108+
/tmp/mylog/checkpoint:0
109+
/tmp/mylog/checkpoint:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=
110+
/tmp/mylog/checkpoint:
111+
/tmp/mylog/checkpoint:— astra h5lA3N6MJnmnD1dPLqxeoWbbPAc0XPKuqomvSZPiVNLkdJmPDvF+7BkMIr4KBynVgo/ipGbNijHxdbvTZ4zKVXbyLwU=
112+
/tmp/mylog/leaves/6c/b0/b1/a3c33114cec1d940b9a6c48b55fb2c73f6efcfd53aeef2644681c9b70a:2
113+
/tmp/mylog/leaves/b8/71/4f/045c7d5d0201b06004e6939d944a981605c5fcfa5d3353a3084303d4ad:1
114+
/tmp/mylog/leaves/85/92/d6/f366d9d1297f44034d649b68afcee74050aa7a55c769130b2f07ecc65d:0
115+
/tmp/mylog/leaves/e0/7c/75/881e1ec1bcad5e45c5cc3d8e2c83cda817a48324514309267ee32ef115:3
116+
/tmp/mylog/seq/00/00/00/00/02:leaf_data_002
117+
/tmp/mylog/seq/00/00/00/00/00:leaf_data_000
118+
/tmp/mylog/seq/00/00/00/00/01:leaf_data_001
119+
/tmp/mylog/seq/00/00/00/00/03:leaf_data_003
120+
```
121+
122+
Note that at this point, no internal node of the tree has been computed, and neither
123+
has the checkpoint been updated. Leaves have only been assigned with a position
124+
in the log.
125+
126+
Attempting to re-sequence the same file contents will result in the `sequence`
127+
tool telling you that you're trying to add duplicate entries, along with their
128+
originally assigned sequence numbers:
129+
130+
```bash
131+
$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries '/tmp/files/*' --public_key=key.pub --origin="${LOG_ORIGIN}"
132+
I1221 13:18:59.735244 924268 main.go:131] 0: /tmp/files/leaf_000 (dupe)
133+
I1221 13:18:59.735362 924268 main.go:131] 1: /tmp/files/leaf_001 (dupe)
134+
I1221 13:18:59.735406 924268 main.go:131] 2: /tmp/files/leaf_002 (dupe)
135+
I1221 13:18:59.735447 924268 main.go:131] 3: /tmp/files/leaf_003 (dupe)
136+
```
137+
138+
### Integrating sequenced entries
139+
140+
We still need to update the rest of the tree structure to integrate these new entries.
141+
We use the `integrate` tool for that, again either passing key files or with the
142+
environment variables set:
143+
144+
```bash
145+
$ go run ./cmd/integrate --storage_dir="${LOG_DIR}" --public_key=key.pub --private_key=key --origin="${LOG_ORIGIN}"
146+
I1221 13:19:20.190193 924589 integrate.go:94] Loaded state with roothash
147+
I1221 13:19:20.190432 924589 integrate.go:132] New log state: size 0x4 hash: 0c2e71ac054d92d58b0efd3013d0df235245331f0c0e828bab62a8fe62460c7f
148+
```
149+
150+
This output says that the integration was successful, and we now have a new log
151+
tree state which contains `0x08` entries, and has the printed log root hash.
152+
153+
Let's look at the contents of the tree directory:
154+
155+
```bash
156+
$ grep -RH '^' /tmp/mylog/
157+
/tmp/mylog/tile/00/0000/00/00/00.04:32
158+
/tmp/mylog/tile/00/0000/00/00/00.04:4
159+
/tmp/mylog/tile/00/0000/00/00/00.04:hZLW82bZ0Sl/RANNZJtor87nQFCqelXHaRMLLwfsxl0=
160+
/tmp/mylog/tile/00/0000/00/00/00.04:McF1R3nScwEJFHQpESACDl9SOdg9uTRLVZaDHzLckI0=
161+
/tmp/mylog/tile/00/0000/00/00/00.04:uHFPBFx9XQIBsGAE5pOdlEqYFgXF/PpdM1OjCEMD1K0=
162+
/tmp/mylog/tile/00/0000/00/00/00.04:DC5xrAVNktWLDv0wE9DfI1JFMx8MDoKLq2Ko/mJGDH8=
163+
/tmp/mylog/tile/00/0000/00/00/00.04:bLCxo8MxFM7B2UC5psSLVfssc/bvz9U67vJkRoHJtwo=
164+
/tmp/mylog/tile/00/0000/00/00/00.04:jNfnGF6uHUDupKFIaPW/QjZnPkINVKkVYc7cBakvPy4=
165+
/tmp/mylog/tile/00/0000/00/00/00.04:4Hx1iB4ewbytXkXFzD2OLIPNqBekgyRRQwkmfuMu8RU=
166+
/tmp/mylog/checkpoint:My Log
167+
/tmp/mylog/checkpoint:4
168+
/tmp/mylog/checkpoint:DC5xrAVNktWLDv0wE9DfI1JFMx8MDoKLq2Ko/mJGDH8=
169+
/tmp/mylog/checkpoint:
170+
/tmp/mylog/checkpoint:— astra h5lA3GOB547TCfoNMEXxENGJVWmpG6Ynk8C6Oaef5gaFotSVLX9isWdvjnhBek94Is9yVPzIvjQTADF/dk2MhHXiCAY=
171+
/tmp/mylog/leaves/6c/b0/b1/a3c33114cec1d940b9a6c48b55fb2c73f6efcfd53aeef2644681c9b70a:2
172+
/tmp/mylog/leaves/b8/71/4f/045c7d5d0201b06004e6939d944a981605c5fcfa5d3353a3084303d4ad:1
173+
/tmp/mylog/leaves/85/92/d6/f366d9d1297f44034d649b68afcee74050aa7a55c769130b2f07ecc65d:0
174+
/tmp/mylog/leaves/e0/7c/75/881e1ec1bcad5e45c5cc3d8e2c83cda817a48324514309267ee32ef115:3
175+
/tmp/mylog/seq/00/00/00/00/02:leaf_data_002
176+
/tmp/mylog/seq/00/00/00/00/00:leaf_data_000
177+
/tmp/mylog/seq/00/00/00/00/01:leaf_data_001
178+
/tmp/mylog/seq/00/00/00/00/03:leaf_data_003
179+
```
180+
181+
The tile directory has been populated with a file, and the checkpoint has been updated.
182+
The `leaves/` and `seq/` directories have not changed.
183+
184+
Each tile can store a maximum of 256 leaf hashes. Since we only have 4 for now, they
185+
fit in a single file. Since it's the first tile of the tree, [its path is 00/0000/00/00/00](api/layout#tile)
186+
Until the tile is filed with 256 leaves, the tile is "partial",
187+
that's what the `00.04` notation means: tile `00/0000/00/00/00.04` is the partial
188+
`00/0000/00/00/00` tile with 4 leaf hashes.
189+
190+
Let's look at each line in the files:
191+
- `32` that's the number of bytes used for hashes
192+
- `4` the number of leaf hashes in this tile
193+
- series of hashes representing the leaf hashes of the tile, and the compact range they
194+
cover
195+
196+
```
197+
b
198+
/ \
199+
/ \
200+
/ \
201+
a c
202+
/ \ / \
203+
h0 h1 h2 h3
204+
| | | |
205+
0 1 2 3
206+
207+
```
208+
209+
We can spot the [leaves and internal node hashes](https://go.dev/play/p/6guNHqpr388) in the infix tree-traversal order.
210+
211+
```bash
212+
$ cat /tmp/mylog/tile/00/0000/00/00/00.04
213+
32
214+
4
215+
hZLW82bZ0Sl/RANNZJtor87nQFCqelXHaRMLLwfsxl0= <-- h0 = sha256(0x0 + leaf_data_0)
216+
McF1R3nScwEJFHQpESACDl9SOdg9uTRLVZaDHzLckI0= <-- a = sha256(0x1 + h0 + h1)
217+
uHFPBFx9XQIBsGAE5pOdlEqYFgXF/PpdM1OjCEMD1K0= <-- h1 = sha256(0x0 + leaf_data_1)
218+
DC5xrAVNktWLDv0wE9DfI1JFMx8MDoKLq2Ko/mJGDH8= <-- b = sha256(0x1 + a + c)
219+
bLCxo8MxFM7B2UC5psSLVfssc/bvz9U67vJkRoHJtwo= <-- h2 = sha256(0x0 + leaf_data_2)
220+
jNfnGF6uHUDupKFIaPW/QjZnPkINVKkVYc7cBakvPy4= <-- c = sha(0x1 + h2 + h3)
221+
4Hx1iB4ewbytXkXFzD2OLIPNqBekgyRRQwkmfuMu8RU= <-- h3 = sha256(0x0 + leaf_data_3)
222+
```
223+
224+
### Adding one more leaf
225+
```bash
226+
$ echo "leaf_data_004" > /tmp/files/leaf_004
227+
228+
$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries '/tmp/files/leaf_004' --public_key=key.pub --origin="${LOG_ORIGIN}"
229+
I1221 13:23:43.956356 926120 main.go:131] 4: /tmp/files/leaf_004
230+
231+
$ go run ./cmd/integrate --storage_dir="${LOG_DIR}" --public_key=key.pub --private_key=key --origin="${LOG_ORIGIN}"
232+
I1221 13:24:11.168864 926446 integrate.go:94] Loaded state with roothash 0c2e71ac054d92d58b0efd3013d0df235245331f0c0e828bab62a8fe62460c7f
233+
I1221 13:24:11.169036 926446 integrate.go:132] New log state: size 0x5 hash: 1b26238e581181883c3f51827c58fe9c9e8a4d39383cbbabaabe0662b3c11496
234+
```
235+
236+
This adds matchin files in `seq`, `leaves`, and updates the checkcpoint, as expected.
237+
A new tile is availble under `00/0000/00/00/00/00.05`:
238+
239+
```bash
240+
$ tree /tmp/mylog/tile
241+
└── 00
242+
└── 0000
243+
└── 00
244+
└── 00
245+
├── 00.04
246+
└── 00.05
247+
248+
5 directories, 2 files
249+
```
250+
251+
Notice that the old tile, `00.04` has not been deleted.
252+
253+
Here's the diff between the two leaves:
254+
255+
```bash
256+
$ diff /tmp/mylog/tile/00/0000/00/00/00.04 /tmp/mylog/tile/00/0000/00/00/00.05
257+
2c2
258+
< 4
259+
---
260+
> 5
261+
9a10,11
262+
>
263+
> 6KUzDe4gX/0rZTZCgfgBtaIGOBkOQz4duxjTT+NeM5w=
264+
```
265+
266+
The number of leaves `4` has been updated to `5`, and a new leaf node hash has appeared.
267+
Note that even though the tree has changed shape to include this new leaf, no internal
268+
node was added to the tile. That's because tiles only store non-emphemeral node, and in this
269+
case, all the new interanl nodes are ephemeral: they will change when new leaves are added to
270+
the tree.
271+
272+
### Filling up the tile
273+
Let's fill up the tile, with 256 entries:
274+
275+
```bash
276+
$ for i in $(seq 5 255); do x=$(printf "%03d" $i); echo "leaf_data_$x" > /tmp/files/leaf_$x; done;
277+
278+
$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries '/tmp/files/*' --public_key=key.pub --origin="${LOG_ORIGIN}"
279+
I1221 13:26:19.752225 927458 main.go:131] 0: /tmp/files/leaf_000 (dupe)
280+
I1221 13:26:19.752350 927458 main.go:131] 1: /tmp/files/leaf_001 (dupe)
281+
I1221 13:26:19.752398 927458 main.go:131] 2: /tmp/files/leaf_002 (dupe)
282+
I1221 13:26:19.752442 927458 main.go:131] 3: /tmp/files/leaf_003 (dupe)
283+
I1221 13:26:19.752499 927458 main.go:131] 4: /tmp/files/leaf_004 (dupe)
284+
I1221 13:26:19.752859 927458 main.go:131] 5: /tmp/files/leaf_005
285+
I1221 13:26:19.753301 927458 main.go:131] 6: /tmp/files/leaf_006
286+
...
287+
288+
$ go run ./cmd/integrate --storage_dir="${LOG_DIR}" --public_key=key.pub --private_key=key --origin="${LOG_ORIGIN}"
289+
I1221 13:26:22.243568 927696 integrate.go:94] Loaded state with roothash 1b26238e581181883c3f51827c58fe9c9e8a4d39383cbbabaabe0662b3c11496
290+
I1221 13:26:22.250694 927696 integrate.go:132] New log state: size 0x100 hash: dc0d01251026e7138412adf1009ef9ed0fc55e2b9a954438b5762deb8e8519c5
291+
```
292+
293+
You can check that the `seq` and `leaves` have been updated with new entries, and so has the checkpoint.
294+
295+
The `tile` directory now looks like this:
296+
297+
```bash
298+
$ tree /tmp/mylog/tile
299+
/tmp/mylog/tile
300+
├── 00
301+
│   └── 0000
302+
│   └── 00
303+
│   └── 00
304+
│   ├── 00
305+
│   ├── 00.04 -> /tmp/mylog/tile/00/0000/00/00/00
306+
│   └── 00.05 -> /tmp/mylog/tile/00/0000/00/00/00
307+
└── 01
308+
└── 0000
309+
└── 00
310+
└── 00
311+
└── 00.01
312+
313+
9 directories, 4 files
314+
```
315+
316+
Since the `00/0000/00/00/00` tile is now full, its partial version have been deleted, and now
317+
point to the full tile.
318+
319+
A new tile has also appeared, one stratum above. `01/0000/00/00/00.01`. It contains a single
320+
node, which is the current root node of the tree. To avoid storing duplicate hashes, this
321+
top level node of the `00/0000/00/00/00` tile has been stripped, and you'll find an
322+
empty line in this file:
323+
```
324+
$ cat /tmp/mylog/tile/00/0000/00/00/00
325+
...
326+
ZkeKg5PJFHO3e+TRuTVf4QL7tk9C9NCBkR82ipcsUxw=
327+
iTG/pTVoZUjBJTfXcdNv2oJjxLQRKUqMOC6zVZoBznk=
328+
R0G/vzOBrC0IdaP092TEzFn4ksrZB77kIlcAK11J7aw=
329+
330+
SIeXDZcyctFVLLjX3BqTs4SirwpzCezE6yZRq9OIKHw=
331+
O876VfSKWrJ5MOQrmnO0jVgqs+vonzE/iC1t681gnAA=
332+
YDrvejyQgwwCB0u+vwiVml4eRbc5CSaJ0rWsieOtRb4=
333+
...
334+
```

0 commit comments

Comments
 (0)