Skip to content

Commit e70a12f

Browse files
authoredNov 20, 2024··
Merge pull request #282 from authzed/bulk-import-documentation
Bulk import documentation
2 parents 5eb4e76 + 46889a7 commit e70a12f

File tree

3 files changed

+127
-2
lines changed

3 files changed

+127
-2
lines changed
 

‎pages/_meta.json

-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
"title": "SpiceDB Documentation",
88
"type": "page"
99
},
10-
1110
"authzed": {
1211
"title": "AuthZed Product Documentation",
1312
"type": "page"

‎pages/spicedb/ops/_meta.json

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
{
22
"observability": "Observability Tooling",
33
"deploying-spicedb-operator": "Deploying the SpiceDB Operator",
4-
"deploying-spicedb-on-eks": "Deploying SpiceDB on AWS EKS"
4+
"deploying-spicedb-on-eks": "Deploying SpiceDB on AWS EKS",
5+
"bulk-operations": "Bulk Importing Relationships"
56
}

‎pages/spicedb/ops/bulk-operations.mdx

+125
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
import { Tabs } from 'nextra/components'
2+
3+
# Bulk Importing Relationships
4+
5+
## Overview
6+
7+
When setting up a SpiceDB cluster for the first time, there's often a data ingest process required to
8+
set up the initial set of relations.
9+
This can be done with [`WriteRelationships`](https://buf.build/authzed/api/docs/main:authzed.api.v1#authzed.api.v1.PermissionsService.WriteRelationships) running in a loop, but you can only create 1,000 relationships (by default) at a time with this approach, and each transaction creates a new revision which incurs a bit of overhead.
10+
11+
For faster ingest, we provide an [`ImportBulkRelationships`](https://buf.build/authzed/api/docs/main:authzed.api.v1#authzed.api.v1.PermissionsService.ImportBulkRelationships) call, which takes advantage of client-side gRPC streaming to accelerate the process and removes the cap on the number of relations that can be written at once.
12+
13+
## Batching
14+
15+
There are two batch sizes to consider: the number of relationships in a chunk written to the stream and the overall number of relationships in the lifetime of the request.
16+
Breaking the request into chunks is a network optimization that makes it faster to push relationships from the client to the cluster.
17+
18+
The overall number of relationships should reflect how many rows can easily be written in a single transaction by your datastore.
19+
Note that you probably **don't** want to push all of your relationships through in a single request, as this could time out in your datastore.
20+
21+
## Example
22+
23+
We'll use the [authzed-dotnet](https://github.com/authzed/authzed-dotnet) client for this example.
24+
Other client libraries will have different syntax and structures around their streaming and iteration,
25+
but this should demonstrate the two different levels of chunking that we'll do in the process.
26+
27+
<Tabs items={["Dotnet", "Python"]}>
28+
<Tabs.Tab>
29+
```csharp
30+
var TOTAL_RELATIONSHIPS_TO_WRITE = 1000;
31+
var RELATIONSHIPS_PER_TRANSACTION = 100;
32+
var RELATIONSHIPS_PER_REQUEST_CHUNK = 10;
33+
34+
// Start by breaking the full list into a sequence of chunks where each chunk fits easily
35+
// into a datastore transaction.
36+
var transactionChunks = allRelationshipsToWrite.Chunk(RELATIONSHIPS_PER_TRANSACTION);
37+
38+
foreach (var relationshipsForRequest in transactionChunks) {
39+
// For each of those transaction chunks, break it down further into chunks that
40+
// optimize for network throughput.
41+
var requestChunks = relationshipsForRequest.Chunk(RELATIONSHIPS_PER_REQUEST_CHUNK);
42+
// Open up a client stream to the server for this transaction chunk
43+
using var importCall = permissionsService.ImportBulkRelationships();
44+
foreach (var requestChunk in requestChunks) {
45+
// For each network chunk, write to the client stream.
46+
// NOTE: this makes the calls sequentially rather than concurrently; this could be
47+
// optimized further by using tasks.
48+
await importCall.RequestStream.WriteAsync(new ImportBulkRelationshipsRequest{
49+
Relationships = { requestChunk }
50+
});
51+
}
52+
// When we're done with the transaction chunk, complete the call and process the response.
53+
await importCall.RequestStream.CompleteAsync();
54+
var importResponse = await importCall;
55+
Console.WriteLine("request successful");
56+
Console.WriteLine(importResponse.NumLoaded);
57+
// Repeat!
58+
}
59+
```
60+
</Tabs.Tab>
61+
<Tabs.Tab>
62+
```python
63+
from itertools import batched
64+
65+
TOTAL_RELATIONSHIPS_TO_WRITE = 1_000
66+
67+
RELATIONSHIPS_PER_TRANSACTION = 100
68+
RELATIONSHIPS_PER_REQUEST_CHUNK = 10
69+
70+
# NOTE: batched takes a larger iterator and makes an iterator of smaller chunks out of it.
71+
# We iterate over chunks of size RELATIONSHIPS_PER_TRANSACTION, and then we break each request into
72+
# chunks of size RELATIONSHIPS_PER_REQUEST_CHUNK.
73+
transaction_chunks = batched(
74+
all_relationships_to_write, RELATIONSHIPS_PER_TRANSACTION
75+
)
76+
for relationships_for_request in transaction_chunks:
77+
request_chunks = batched(relationships_for_request, RELATIONSHIPS_PER_REQUEST_CHUNK)
78+
response = client.ImportBulkRelationships(
79+
(
80+
ImportBulkRelationshipsRequest(relationships=relationships_chunk)
81+
for relationships_chunk in request_chunks
82+
)
83+
)
84+
print("request successful")
85+
print(response.num_loaded)
86+
```
87+
</Tabs.Tab>
88+
</Tabs>
89+
90+
The code for this example is [available here](https://github.com/authzed/authzed-dotnet/blob/main/examples/bulk-import/BulkImport/Program.cs).
91+
92+
## Retrying and Resuming
93+
94+
`ImportBulkRelationships`'s semantics only allow the creation of relationships.
95+
If a relationship is imported that already exists in the database, it will error.
96+
This can be frustrating when populating an instance if the process fails with a retryable error, such as those related to transient
97+
network conditions.
98+
The [authzed-go](https://github.com/authzed/authzed-go) client offers a [`RetryableClient`](https://github.com/authzed/authzed-go/blob/main/v1/retryable_client.go)
99+
with retry logic built into its `ImportBulkRelationships` logic.
100+
101+
This is used internally by [zed](https://github.com/authzed/zed) and is exposed by the `authzed-go` library, and works by
102+
either skipping over the offending batch if the `Skip` strategy is used or falling back to `WriteRelationships` with a touch
103+
semantic if the `Touch` strategy is used.
104+
Similar logic can be implemented using the other client libraries.
105+
106+
## Why does it work this way?
107+
108+
SpiceDB's `ImportBulkRelationships` service uses [gRPC client streaming] as a network optimization.
109+
It **does not** commit those relationships to your datastore as it receives them, but rather opens a database transaction
110+
at the start of the call and then commits that transaction when the client ends the stream.
111+
112+
This is because there isn't a good way to handle server-side errors in a commit-as-you-go approach.
113+
We take this approach because if we were to commit each chunk sent over the network, the semantics
114+
of server-side errors are ambiguous.
115+
For example, you might receive an error that closes the stream, but that doesn't necessarily mean
116+
that the last chunk you sent is where the error happened.
117+
The error source could be sent as error context, but error handling and resumption would be difficult and cumbersome.
118+
119+
A [gRPC bidirectional streaming](https://grpc.io/docs/what-is-grpc/core-concepts/#bidirectional-streaming-rpc) approach could
120+
help address this by ACKing each chunk individually, but that also requires a good amount of bookkeeping on the client to ensure
121+
that every chunk that's written by the client has been acknowledged by the server.
122+
Requiring multiple client-streaming requests means that you can use normal language error-handling flows
123+
and know exactly what's been written to the server.
124+
125+
[gRPC client streaming]: https://grpc.io/docs/what-is-grpc/core-concepts/#client-streaming-rpc

0 commit comments

Comments
 (0)
Please sign in to comment.