<thing>
signifies a placeholder value, where the text in brackets is a concrete value of the type described by the text and only known at runtime. e.g.<hash>
is a base16-encoded hash appearing something like7e13a3f34f6494e539ffd32c1bb35f18
<ts>
is short for timestamp, a string representation of a point in time in Unix Time (number of seconds since the Unix Epoch aka since midnight 1970 jan 1)<mono-ts>
is short for monotonic timestamp, i.e. timestamps that only ever increase in size and which are guaranteed to not overlap with any other monotonic timestamp. This can be used to yield unique database view keys that can also be ranged over to sort by time. We will be using themonotonic-timestamp
nodejs module. Monotonic timestamps may look like the following:1676993016405 1676993016406 < concurrent calls cause a suffix to ensure uniqueness 1676993016406.001 < concurrent example 1 1676993016406.002 < concurrent example 2 1676993016406.003 < concurrent example 3
<hash>
is the binary-represented 32-byte blake2b hashes that cable operates on<hash.16>
is the base16-encoded 32-byte blake2b hashes that cable operates on<pubkey>
is the binary-represented public portion of a ed25519 keypair<pubkey>
is the base16-encoded public portion of a ed25519 keypair<channel>
is the utf-8 encoded string that represents a cable channel (think: chat channel, because that is what it is).
- accreted state: a view that exists primarily to simplify application-level concerns (e.g. make it easy to find the latest topic). Does not help answering cable requests. accrete because it changes over time - but maybe not the best name! materialized is probably more standard?
- view names starting with
!
: this is not necessary for the indexed views to be properly queryable using lexicographic sort, but makes it easier to see which strings in this documents are views; it's a convention used in this document, if you will
When you are using a key-value database like leveldb, there's a particular quirk you have to
work around when it comes to structuring data in a table-like manner. That quirk is dealing
with lexicographic sort order. This is a fancy way of saying that keys will be compared in a
left-to-right, character-by-character manner. The key B
is "higher" than the key A
.
If you are developing on linux, the command ascii
is very helpful for seeing the different
ascii values of different characters.
For instance, !
is the smallest printable character,
with a decimal value of 33. Whereas ~
is the largest printable character in the ascii table,
with a decimal value of 126. These two are commonly used to delineate queries, using greater
than !
and a less than ~
operations.
This all means that you have to be somewhat mindful and careful when laying out the key space, which names and pieces of data in a particular key, of your tables.
These materialized views comprise the database portion of cable-core, and will be serviced by
instantiations of leveldb e.g. the nodejs wrapping module level
, which works in browsers and
in nodejs.
When a cable response streams in as a result of a previously issued request, the received data will be persisted and tied to its hash in the data store. The data is indexed in different kinds of views according to what type of post it is.
The following sections describe the different indexes and stores planned for cable-core
. Each
section explains what the view is used for and what the storage schema looks like.
A data store where each key is a hash, and each value is the data that generates that hash. This is how we store all the posts we receive in data responses, or which the local user creates by making posts themselves.
<hash> -> <blob>
This view is implemented in data-store.js
.
A reverse lookup table, mapping hashes to which views and under what keys in those views the hashes have been referenced. This is how we update other views when a post has been deleted.
This matters because if a post is deleted, then the hash pointing to it is now irrelevant to index, because we no longer have that data (and never will again).
!<hash.16>!<mono-ts> => "<viewname><separator><viewkey>"
This view is implemented in reverse-hash-map.js
.
The posts view tracks two different kinds of data. One part of the view, the messages view, keeps track of hashes
of post/text
and `post/delete. The other, the deleted view, keeps track of which deletes have happened in order
to never ask for or persist those posts. The key schema, as denoted below, is different for the
two subviews.
The posts view maps channel name+time to a hash, where the hash should resolve to either a
post/text
or a post/delete
when queried for in the data store.
!chat!<mono-ts>!post!<channel> -> <hash>
This view is implemented in messages.js
.
Note: mono-ts here should be related to the indexed post's claimed timestamp; do some kind of pass / logic that to index uniquely using the claimed timestamp
Note: Do secondary topological sort after retrieving posts?
Note: post/delete
may target any kind of post, including non-post/text
types
Deletions. This view allows us to persist deletions and can be used to prevent resyncing deleted posts.
!chat!deleted!<hash.16> -> 1
This view is implemented in deleted.js
.
Note: This tracks the hash of the deleted post i.e. NOT the hash of the post/delete
itself.
Channel state maps channel name of a post, alongside extra details part of the key, to a channel state-related hash. This view keeps track of:
- all
post/join
orpost/leave
made by<pubkey>
!state!member!<channel>!<mono-ts>!<pubkey> -> <hash>
- all
post/info
for nick published by<pubkey>
!state!nick!<channel>!<mono-ts>!<pubkey> -> <hash>
- all
post/topic
for channel, published by anyone
!state!topic!<channel>!<mono-ts> -> <hash>
Querying this view should suffice to answer any incoming Channel State Request
.
This view is implemented in channel-state.js
.
Note: this view does handle the case of answering a historic (give all history you have) channel state request.
The author view indexes all posts by the public key that authored them. The view key
also includes what type of post it was; e.g. a post/text
or a post/join
. Which type of post
it was is encoded by using the corresponding post type ID as defined in the cable spec. Entries
are indexed by mapping the public key and the post type to the corresponding post hash.
You can query for any given post_type
authored by any given public key.
!author!<pubkey>!<post_type-id>!<mono-ts> -> <hash>
The existence of this view facilitates deleting all posts authored by pubkey, which would be useful if a pubkey has been blocked and you no longer want to host the content they authored.
This view is implemented in author.js
.
The channel membership view keeps track of which channels have which users currently joined to it.
!channel!member!<channel>!<pubkey> -> 1 or 0
- Set value to 1 for
post/join
issued by<pubkey>
to<channel>
. - Set value to 0 for
post/leave
issued by<pubkey>
to<channel>
.
This view is implemented in channel-membership.js
.
Note: Each accreted view needs to be potentially re-indexed if a deleted message had the same type as the view. This can be done by querying the author view to see if there was any post of a related type prior to the deleted post.
Note: The stable sorted set of names, derived from the keys recorded in this view, are what we would respond with to answer a channel list request.
The channel topic view keeps track of the topic of each channel the local user knows about.
!channel!topic!<channel> -> <topic>
This view is implemented in topics.js
.
Note: Each accreted view needs to be potentially re-indexed if a deleted message had the same type as the view. This can be done by querying the author view to see if there was any post of a related type prior to the deleted post.
This view specifically deals with mapping out the different types of
post/info
contents that may have been authored by users.
As of writing (2023-03-16) the cable spec only has one specified post/info
key: name
. Other
types of user-related information may be added in the future: a user description, a user image,
a user's post expiry preference, etc.
!user!info!name!<pubkey>!<mono-ts> => latest post/info setting nickname property
!user!info!<property>!<pubkey>!<mono-ts> in general
!user!latest!info!<property>!<pubkey> in general
This view is implemented in user-info.js
.
Some notes now follow on how we could keep track of whether respones we get back for requests made are sensible or not. The notes of this section are more exploratory than explicative. Jump to section Verifying index sufficiency, below, if this is not your jam.
Tracking the path of a request and its sibling, the response, is done with request ids
(req_id
).
A req_id
always belongs to a request or to the response caused by a request. Thus a req_id
has a source (the message type that "spawns" the request) and an expectation of what to get
back (the message type that correctly answers the request e.g. a post response
for a post request
, a hash response
for a channel time range request
, or a channel list response
for a channel list request
). We can also prepare for the cable specification's circuits
behaviour by keeping track of an extra id to more efficiently route messages, instead of
floodfilling to all connected peers.
When we deal with a request, we can associate the generated req_id
with:
- source (the message type of the request), a varint
- expects (the expected message type of the response), a varint
- origin (are we the end destination for the returning response, or are we passing on information between other nodes), a boolean
- circuitid (which connected node to send the response to), a varint; might be premature and pending on how it is specified
Also worth noting is that a response will have its req_id
field set to the same value as the
request its responding to. That is: response[req_id] = request[req_id]
.
We could go one step further, and associate the two request-response pairs that will be required to fulfill the entire life cycle. That is:
Let's say we make a channel time range request. The intent of that is to eventually receive
posts of types post/text
and post/delete
. To achieve this, the channel time range
request is sent. The reply that comes back is a hash response (list of hashes). To get the
actual messages containing posts (and not a list of post hashes), we send out a new request, this
time a post request. Back comes a post response, containing some or all of the requested
hashes. That is, we have two roundtrips:
Roundtrip | Outgoing (Request) | Incoming (Response) |
---|---|---|
1 | Channel Time Range Request | Hash Response |
2 | Post Request | Post Response |
In roundtrip 1, the request and response are to have the same req_id
. However, the request
and response in 2 have a different req_id
than in 1 but which is shared within roundtrip 2's
request and response. i.e. there is potentially a a gap missing between associating the actions
in 1 and the actions in 2, even though 2 happens as a direct result of 1.
To facilitate forwarding, we could have a map to keep track of where a request came from, in order to send the response that same way. In case we get the same request from multiple sources, the actual intended destination is uncertain; we should maintain a list of destinations.
originMap: req_id -> [peerid]
When we receive a response we look up the req_id
, and ask the following questions:
- Is this intended for us? If
origin = true
then, yes. Otherwise we should forward it onwards: look in theoriginMap
for where to try forwarding (and remember that entries in the map might have gone offline since being added during the current session). - Does the response adhere to the expected message type?
- If not: throw it out? log?
- Are the contents of the response what was requested?
- If not: throw it out? log?
- for example: sending a bunch of
post/topic
as a response to a channel time range (post/text + post/delete) is not the expected behaviour
- for example: sending a bunch of
If we throw out unexpected responses, we risk running foul of the robustness principle:
"be conservative in what you do, be liberal in what you accept from others". It is often reworded as: "be conservative in what you send, be liberal in what you accept". The principle is also known as Postel's law, after Jon Postel, who used the wording in an early specification of TCP.
Perhaps the best recourse is to log the failed expectations, and in the specification describe the expected behaviour such that other implementations try their best at "being conservative in what they send".
The following sections map each cable request to what kind of database index operations will suffice to correctly answer the request. Initially intended as a form of sketching / thinking out loud that helps to identify gaps ahead of time, before the views have been written.
The following brief section outlines what each cable request ultimately expects to receive as a result of making the request.
Post request expects:
- post response
msg_type=1
- data payloads to correspond to the requested hashes
- verify by tracking the requested hashes and cross-referencing with the hash of each post in the post response payload?
Channel time range expects:
- hash response
msg_type=0
- data payloads to be of type:
- post/text
- post/delete
Channel state expects:
- hash response
msg_type=0
- post response payloads to be of type:
- post/topic
- post/join
- post/leave
- post/info
Channel list expects:
- a channel list response
msg_type=7
- payloads to be UTF-8 strings
This section outlines the index queries and operations needed to answer the different cable requests.
A post request wants the data identified by a list of hashes.
Query:
<hash> -> <blob>
Use the returned binary payloads to fashion the post response. Construct a list and fill it with the payloads that were found when querying the view.
Query:
!chat!text!<channel>!<ts> -> <hash>
Use the request's specified start and end time ranges, and the limit option, when querying to get the correct range of data. Answer with the list of hashes in a hash response.
Query:
!state!member!<channel>!<mono-ts>!<pubkey> -> <hash>
!state!nick!<channel>!<mono-ts>!<pubkey> -> <hash>
!state!topic!<channel>!<mono-ts> -> <hash>
using the request's channel name and a sort that gives us the latest records
first. In each of the three queries, use leveldb's limit: 1
option to only
get the latest entry.
The query range should operate regardless of the public key portion of view key, which are only part of the key to provide unique records that point to the entry.
A historic channel state request is a request that sets historic = 1
.
Query:
!state!member!<channel>!<mono-ts>!<pubkey> -> <hash>
!state!nick!<channel>!<mono-ts>!<pubkey> -> <hash>
!state!topic!<channel>!<mono-ts> -> <hash>
Using the request's channel name, and a sort that gives us the latest entries. What we get back are lists of hashes. Smoosh the lists together.
Answer with the list of hashes that are retrieved from executing the query.
Query:
!channel!member!<channel>!<pubkey> -> 1 or 0
Use the retrieved keys to extract a list of channel names, and convert the list into the set of known channel names. Then convert the set back into a list and stable sort it.
A delete request is only valid (will be honored by the local user) if:
- the requester is regarded as a moderator or admin by the local user, or
- the requester is requesting to delete posts they have authored themselves
- i.e. public key of the delete/post payload and the post identified by the requested hash are the same.
And in all cases: the cryptographic signature of the post/delete payload must be correct.
- Honor the delete request by removing the associated payload from:
<hash> -> <blob>
- Persist the post/delete by saving the binary payload in:
<hash> -> <blob>
- Persist the hash of the post/delete in the posts view:
!chat!text!<channel>!<ts> -> <hash>
- Finally, persist the hash identifying the deleted post, to limit future attempts to resync this post, by making an entry in:
!chat!deleted!<hash.16> -> 1
The following sections depict the indexing actions that spring forth when a new post is added to the database, and how to update the database when the underlying post is deleted from the database.
In general: each time a view has a new entry added which maps to a hash, add a new entry to the reverse lookup table:
!<hash.16>!<mono-ts> => "<viewname><separator><viewkey>"
A post/text (post_type=0
) is written, meaning a hash is mapped to a binary
payload and persisted in the database.
The following tables are updated:
<hash> -> <blob>
!chat!text!<channel>!<ts> -> <hash>
!author!<pubkey>!0!<mono-ts> -> <hash> // post/text
To be able to remove these entries later on, for example when a delete request comes in, we need to know which keys are mapped to that hash for each indexed view. We save the following entries in the reverse-hash lookup:
!<hash.16>!<mono-ts> => "chat!text!<separator>chat!text!<channel>!<ts>"
!<hash.16>!<mono-ts> => "author<separator>author!<pubkey>!1!<mono-ts>"
When we delete the corresponding hash, the following operations take place:
Delete from:
<hash> -> <blob>
Get each view key using <hash>
:
!<hash.16>!<mono-ts> => "chat!text!<separator>chat!text!<channel>!<ts>"
!<hash.16>!<mono-ts> => "author<separator>author!<pubkey>!1!<mono-ts>"
Delete entry in view using retrieved key:
!chat!text!<channel>!<ts> -> <hash>
!author!<pubkey>!1!<mono-ts> -> <hash> // post/text
If this delete was from a delete request (post/delete), also persist the delete by saving the hash of the deleted post:
!chat!deleted!<hash.16> -> 1
Deletes may also happen as a result of reducing the amount of messages stored. In that case it did not come from a peer requesting a delete, and so it should not be persisted in the view:
!chat!deleted!<hash.16> -> 1
This one is a bit tricky to think about correctly. When a post/delete
is
created, this necessitates deleting what it points to as well.
First: perist the post as usual; a post/delete (post_type=1
) is written,
meaning a hash is mapped to a binary payload and persisted in the database.
The following tables are updated:
<hash> -> <blob>
!chat!deleted!<hash.16> -> 1
!author!<pubkey>!1!<mono-ts> -> <hash> // post/delete
Now, time to delete the pointed to content:
- Use the hash to delete the payload from:
<hash> -> <blob>
- Look up the hash to be deleted in the reverse-lookup, getting table names and the keys.
- For each view name and key pair: remove the entry identified by the key from the associated view.
- Depending on the type of message that was deleted (e.g. a
post/join
,post/leave
,post/info
), reindex the relevant views in case they now have outdated accreted information. This can typically be done by fetching the latest relevant hash (e.g. using the channel-state view, or the author view to figure out the newestpost/join
orpost/leave
that's in the database), getting its data, and reindexing the view with that data. If the data is already indexed it does no harm, and otherwise the index becomes up to date.
Deleting a previously stored delete request, at least as a result of a newer delete request targeting the older delete request, is currently not considered a valid operation.
A post/info (post_type=2
) is written, meaning a hash is mapped to a binary
payload and persisted in the database.
The following tables are updated:
<hash> -> <blob>
!author!<pubkey>!2!<mono-ts> -> <hash> // post/info
!state!nick!<channel>!<mono-ts>!<pubkey> -> <hash>
!user!info!name!<pubkey>!<mono-ts> => latest post/info setting nickname property ??
To be able to remove these entries later on, for example when a new post/info is written, we need to know which keys are mapped to that hash for each indexed view. We save the following entries in the reverse-hash lookup:
!<hash.16>!<mono-ts> => "author<separator>author!<pubkey>!2!<mono-ts>"
!<hash.16>!<mono-ts> => "state<separator>state!nick!<channel>!<mono-ts>!<pubkey>"
!<hash.16>!<mono-ts> => "user<separator>user!info!name!<pubkey>!<mono-ts>"
When we delete the corresponding hash, the following operations take place:
Delete <hash>
from:
<hash> -> <blob>
Get each view key using :
!<hash.16>!<mono-ts> => "author<separator>author!<pubkey>!2!<mono-ts>"
!<hash.16>!<mono-ts> => "state<separator>state!nick!<channel>!<mono-ts>!<pubkey>"
!<hash.16>!<mono-ts> => "user<separator>user!info!name!<pubkey>!<mono-ts>"
Delete entry in view using retrieved key:
!author!<pubkey>!2!<mono-ts>
!state!nick!<channel>!<mono-ts>!<pubkey> -> <hash>
!user!info!name!<pubkey>!<mono-ts>
A post/topic (post_type=3
) is written, meaning a hash is mapped to a binary
payload and persisted in the database.
The following tables are updated:
<hash> -> <blob>
!channel!topic!<channel> -> <topic>
!state!topic!<channel>!<mono-ts> -> <hash>
!author!<pubkey>!3!<mono-ts> -> <hash> // post/topic
To be able to remove these entries later on, for example when a new post/info is written, we need to know which keys are mapped to that hash for each indexed view. We save the following entries in the reverse-hash lookup:
!<hash.16>!<mono-ts> => "author<separator>author!<pubkey>!3!<mono-ts>"
!<hash.16>!<mono-ts> => "state<separator>state!topic!<channel>!<mono-ts>"
When a new post/topic comes in, we'll need to update indexes.
Index the new message:
!author!<pubkey>!3!<mono-ts> -> <hash> // post/topic
Update the latest channel state to point to the new hash:
!state!topic!<channel>!<mono-ts> -> <hash>
Update the topic index for the channel, setting the new topic:
!channel!topic!<channel> -> <topic>
Add new reverse-lookup entries:
!<hash.16>!<mono-ts> => "author<separator>author!<pubkey>!3!<mono-ts>"
!<hash.16>!<mono-ts> => "state<separator>state!topic!<channel>!<mono-ts>"
When we delete the corresponding hash, the following operations take place:
Delete <hash>
from:
<hash> -> <blob>
Get each view key using :
!<hash.16>!<mono-ts> => "author<separator>author!<pubkey>!3!<mono-ts>"
!<hash.16>!<mono-ts> => "state<separator>state!topic!<channel>!<mono-ts>"
Delete entry in view using retrieved key:
!state!topic!<channel>!<mono-ts> -> <hash>
!author!<pubkey>!3!<mono-ts> -> <hash> // post/topic
Get the most recent topic message for this channel:
!state!topic!<channel>!<mono-ts> -> <hash> // post/topic
<hash> -> <blob>
If there is such a latest message left in the database. If there is, update the accreted topic for that channel accordingly:
!channel!topic!<channel> -> <topic>
A post/join (post_type=4
) is written, meaning a hash is mapped to a binary
payload and persisted in the database.
The following tables are updated:
<hash> -> <blob>
!state!member!<channel>!<mono-ts>!<pubkey> -> <hash>
!channel!member!<channel>!<pubkey> -> 1
!author!<pubkey>!4!<mono-ts> -> <hash> // post/join
!author!<pubkey>!5!<mono-ts> -> <hash> // if post/leave
To be able to remove these entries later on, for example when a new post/info is written, we need to know which keys are mapped to that hash for each indexed view. We save the following entries in the reverse-hash lookup:
!<hash.16>!<mono-ts> => "author<separator>author!<pubkey>!4!<mono-ts>"
!<hash.16>!<mono-ts> => "state<separator>state!member!<channel>!<mono-ts>!<pubkey>"
for post/leave: !<hash.16>!<mono-ts> => "author<separator>author!<pubkey>!5!<mono-ts>"
When a channel membership change happens, i.e. a post/leave for the same channel as the previous post/join, we'll need to update indexes.
Index the new message:
!author!<pubkey>!4!<mono-ts> -> <hash> // if post/join
!author!<pubkey>!5!<mono-ts> -> <hash> // if post/leave
Update the latest channel state to point to the new hash:
!state!member!<channel>!<mono-ts>!<pubkey> -> <hash>
Update the membership view:
!channel!member!<channel>!<pubkey> -> 1 // if post/join
!channel!member!<channel>!<pubkey> -> 0 // if post/leave
Add new reverse-lookup entries:
!<hash.16>!<mono-ts> => "author<separator>author!<pubkey>!4!<mono-ts>"
!<hash.16>!<mono-ts> => "state<separator>state!member!<channel>!<mono-ts>!<pubkey>"
for post/leave: !<hash.16>!<mono-ts> => "author<separator>author!<pubkey>!5!<mono-ts>"
When we delete the corresponding hash, the following operations take place:
Delete <hash>
from:
<hash> -> <blob>
Get each view key using :
!<hash.16>!<mono-ts> => "author<separator>author!<pubkey>!4!<mono-ts>"
!<hash.16>!<mono-ts> => "state<separator>state!member!<channel>!<mono-ts>!<pubkey>"
for post/leave: !<hash.16>!<mono-ts> => "author<separator>author!<pubkey>!5!<mono-ts>"
Using the view key, splice out the channel name.
Delete entry in view using retrieved key:
!channel!member!<channel>!<pubkey>
!author!<pubkey>!4!<mono-ts>
Get all the most recent membership messages for this user:
!author!<pubkey>!4!<mono-ts> -> <hash> // post/join
!author!<pubkey>!5!<mono-ts> -> <hash> // post/leave
<hash> -> <blob>
Sort the list of entries and pick the latest {join, leave} for the target channel, if there is such a latest message left in the database. If there is, update the channel membership for that channel accordingly:
!channel!member!<channel>!<pubkey> -> 1 or 0