Journal

📆 Last updated Oct 4

Overview

What works well

Documents can be created and edited offline with zero latency and no need for a server.
Documents can be shared with remote peers using only a tiny, generic signal/relay server.
Synchronization of changes across multiple peers, whether online & real-time, or offline & concurrent, "just works".
Performance is good both for loading documents from storage and synchronizing changes with peers.
All data is encrypted at rest.
Once a dataset is loaded, browsing through the data and making changes is smooth and fast, even with very large datasets.

Limitations

Dataset size

Currently we have a hard limit on dataset size, imposed by the combination of two factors:

Our current design requires us to load and parse the entire dataset into browser memory at once
Automerge adds metadata and history to each record, which increases our memory requirements to 30 times the size of the underlying data.

We can currently work comfortably with datasets up to about 50,000 rows (using rows containing 12 fields of various types, about 1 KB of raw data). The most we've been able to load without running out of browser memory is 100,000 rows.

Consistency

Actions are not transactional.

Example: A changes field X from STRING to NUMBER. B sets X to a string value.

This is just the way things are for distributed, auto-merged documents.

Proposed architectural changes

Premises

The UI shouldn't have to load everything into memory
Slow operations should happen in a worker
Document history should only be loaded when a document is edited, not for display
Communication between the UI and the worker should happen through actions and queries (avoid accessing the entire state)

Persistent storage

Currently, we persist a single append-only feed of all changes to all documents. Instead:

Have one feed per document
Store the latest snapshot for each document

This way, documents can be loaded instantly (using simple queries and indexes on the snapshot collection), and the feed is loaded only for edits (local or remote)

Networking

When a new document is seen from a peer, the snapshot should be retrieved first so it can be displayed before downloading the change history.

When joining a discovery key for the first time, all the snapshots should be retrieved before getting the change history for any of them. This should happen in as few messages as possible to reduce lag.

Worker

All the Automerge, local storage and sync components live in the worker, thus freeing the UI context of all the expensive overhead of the system

There will be a coarse-grained query API to query and retrieve a view of a set of rows, aggregate information, etc

v1: order + range / count(*)
v2: filter
v3: * (aggregation, projection, w/e. Might be entirely out of scope)

The actions are still the ones we've defined earlier in the project.

UI

Data retrieval for the grid will be changed to use a server-side model, as defined by AgGrid, where the worker takes the place of the server.

The UI should only retrieve what's needed to be displayed on the screen at any given time.

Workplan

Per-document storage
- Do storage ourselves instead of relying on hypercore
- Merge DocSet and StorageFeed into single object
Run store in worker
Query API + paged datasets in AgGrid

Note:

Week 16

September 23 Herb, Diego

Goals

Run all state management (updates, selections, network sync, storage, etc) in a separate web worker process
Store a snapshot of current state, so that we can give control to the user without waiting for the document's entire history to be loaded
Implement lazy instantiation of individual documents, so that we're not constrained by browser memory

Week 15

September 16 Herb, Diego (75%)

Web worker proof of concept: grid sample data generation runs in background
Redesigned internal storage of collections so that an index is no longer required; in theory, this means that we no longer have to load the entire dataset into memory at once
Refactoring, documentation, other internal polish

We're now able to load a 50K-row dataset into memory with decent performance. The next step is to get to a point where we don't need to load everything into memory, but can instantiate individual documents lazily just when they need to be displayed or altered.

Week 12

August 26 Herb, Brent (20%)

Completed

Electron and web worker research

Week 11

August 19 Herb, Brent (90%)

Completed

Finish collections API
Get collections working in the grid
Benchmark performance improvements

See also: http://github.com/DevResults/cevitxe/projects/1

We finished the work on multi-document application state. Grids now load from storage and sync with peers nearly twice as fast as before. (For raw data, see Cevitxe performance, week 9 vs 11).

(Populating state with random data is actually slower now, but this is an artificial task that's not representative of real-world use, so we're OK with that.)

This is an improvement, but there's still work to be done.

There are two metrics that we're focused on:

Time to interactivity: The time from when a user initiates a task (e.g. syncing with a peer or loading from storage) until they can see at least some of the data and the UI is responsive. (This is the time measured above.) Right now, the user is blocked until all documents are fully loaded into memory. For 20K rows, that's currently about 12 seconds when the UI is frozen and unresponsive.
- Next week we'll be looking into running much of Cevitxe's functionality in a separate background worker process, so that the UI isn't blocked and the user doesn't have to wait for a document to be fully loaded into memory before being able to work with it.
  
  Google Sheets, for example, does this well: When loading a 100K dataset, within one or two seconds the user can see the first screenful of data, select cells, and make edits. It takes another minute or so for the document to be fully loaded, as indicated by a Working alert and a progress bar.

Maximum in-memory dataset: Right now, the browser (with a maximum heap of 2GB) runs out of memory trying to load 30K rows (9 MB) of data.
- Part of this is due to known inefficiencies in Automerge's memory usage. An overhaul of Automerge's internal data structures is currently in progress, specifically to resolve this problem. It remains to be seen how much of an improvement this provides.
- There are almost certainly ways that we can optimize memory usage in our own code.
Regardless of how efficiently Cevitxe and Automerge use memory, there will always be a hard limit, imposed by the browser, on how much data can be loaded into memory. An Electron app will have more leeway to access available system resources, so that would be the recommendation for users with larger datasets.

Week 10

August 12 Herb, Brent

We continued the work of changing Cevitxe to use a DocSet instead of a single Doc for the application state. We found it a bit tedious to manage the Cevitxe store state as a DocSet and the redux store state as a plain object so we decided to try making the redux store use the same DocSet. This conversion eventually was proven to be unfruitful. We found ourselves in a situation where our adapted redux reducer could either A) alter the redux state directly (this violates the redux rule that the state should be immutable) which allows peer-to-peer functionality but prevents the middleware from being able to detect state changes. Or B) have our adapted reducer modify a clone of the redux state (what redux expects) which causes new a DocSet to be created that breaks peer-to-peer communication. We're now working out a plan to more forward with DocSet support while keeping the redux state as a plain object.

While the DocSet redux state experiment didn't work out we did have other wins:

Feed supports reading/writing changes to multiple documents in a DocSet
Created some helper methods for working with DocSet

Observations

Automerge only supports objects as the basis for a document. This means that anything that will be stored will need a container object. While perhaps not initially intuitive this is how all document storage systems work. e.g. You can't have just a number, string, or array as a "record" in cevitxe, it would have to be an object with a property for that value.

Completed tasks

Grid: Show some UI while loading from local storage PR 20
Grid: Show progress while generating data PR 22

Week 9

August 5 Herb, Diego (75%)

Started work to support collections of Automerge docs
Created some stress tests for Automerge to measure performance with larger datasets.
Paired with Automerge author Martin Kleppmann to fix a bug (see 5ca49f) that was causing inserts to fail for datasets above a certain size
Added integration tests to the grid using Cypress.

Week 8

July 29 Herb, Diego (45%)

Smooth out grid UI (multiple selections, deleting rows, enter/tab behavior, etc.)
Moved the documentId persistence to the URL (instead of local storage); toolbar just uses garden-variety hyperlinks to switch to a new documentId.
Set up deployment of the grid example and the signal server to Heroku and tested it over the public internet. It works! 😮
https://cevitxe-example-grid.herokuapp.com
https://cevitxe-signal-server.herokuapp.com

Week 7

July 22 Herb, Diego (60%)

Show-and-tell
More test coverage for connections etc.

Week 6

July 15 Herb, Diego (60%)

Use friendly document IDs (e.g. zealous-meerkat, divine-obsidian, etc.)
Add document ID selection to toolbar
Got rid of signal-hub and webrtc-swarm. Took discovery-cloud-server and discovery-cloud-client as a starting point and built our own signal-server and signal-client. So all communication now takes place via web socket connections piped through the signal server.

Notes

The swarm idea is still good, and eventually we'll want to steal that functionality from one of the DAT project implementations and add it to signal-server; that way if a large number of people want to connect to a single document, you don't have to create C(2, N) two-way connections; you can just have each peer connect to a few "nearby" peers, and relay any changes across the swarm.

Week 5

July 8 Herb (20%)

Write readme for Cevitxe, targeted at a developer using the framework, explaining the thinking behind it and how to use it
Create shared toolbar component

Week 4

July 1 Herb, Brent (60%)

Got createStore/joinStore/connections under test
Added chat sample to examples
Created Cevitxe class, merged in createStore/joinSTore

`yarn start:grid`	`yarn start:todo`	`yarn start:chat`
A simple table edito	An implementation of TodoMVC	A chat channel

Week 3

June 24 Herb, Brent (100%)

Got persistence working again using Hypercore
Added original grid demo as an example to Cevitxe

Conclusions

Hypercore is still useful for managing encrypted persistence of the append-only feed
Automerge.Connection is the right choice for synchronizing state between two peers
The Cevitxe library is close to being ready to release to the public
DX around initial state scenario needs to be improved
We've modified our selectors to deal with state being completely empty, which is never a concern in Redux since at least you'll have your default state
Would be good to have a more natural API that only provided state to the app once it was ready - could be something like PersistGate from redux-persist, or something that exposed loading/error/data like Apollo Client's Query component

To do

Straighten out crypto with Hypercore (currently stubbed out)
Needs a proper README
Would be nice to pull in chat example from hello-hypercore as a super-simple third example
Close things up properly (probably requires making mocks more realistic? or not mocking for those tests?)
We're currently mocking both webrtc-swarm and signalhub, and we probably only need to be mocking signalhub since it's the only thing that makes network calls. But those two mocks are coupled because the mock signalhub isn't exposing peers the same way as the real one.
Would prefer to have a networking stack that isn't reliant on webrtc, something along the lines of discovery-cloud-server + discovery-cloud-client, but with webrtc as the default & falling back on piped websockets as an alternative
Revisit the way discovery keys are managed & where they come from - they don't need to be Hypercore crypto keys, just need to be unique strings

Week 2

June 17 Herb, Brent (100%)

Started work on adapting Cevitxe to use Automerge.Connection
Refactored Automerge.Connection & fixed a bug that was keeping it from working in our scenario
Added UI to TodoMVC example for exposing keys creating & joining lists
Converted project to monorepo using Yarn Workspaces to improve iteration time

To do

Set up tests & mocking framework to improve iteration time (rather than testing each time in the browser)

Week 1 (kickoff)

June 10 Herb, Brent, Diego (100%)

Built chat example using hypercore
Built grid example using ag-grid
Started Cevitxe using automerge, hypercore for storage & replication, exposing Redux store
Wired TodoMVC to Cevitxe
Wired grid example to Cevitxe

Conclusions

Exposing a Redux store seems like a good way of making this useful to the React ecosystem
ag-grid has a good API, seems like a good choice for the grid piece

To do

Because Automerge stores the whole change history, it needs storage space equal to several times the size of the actual dataset
On Diego's machine, the datagrid example could handle 10K rows with good performance, but started to break down at 100K
Hypercore works, but has no way of negotiating diff between two peers' states on connection so that it only communicates what's necessary. That's a feature of Automerge.Connection, so need to figure out a way to use that.

Journal

Overview

What works well

Limitations

Dataset size

Consistency

Proposed architectural changes

Premises

Persistent storage

Networking

Worker

UI

Workplan

Week 16

Goals

Week 15

Week 12

Completed

Week 11

Completed

Week 10

Observations

Completed tasks

Week 9

Week 8

Week 7

Week 6

Notes

Week 5

Week 4

Week 3

Conclusions

To do

Week 2

To do

Week 1 (kickoff)

Conclusions

To do

Clone this wiki locally