-
Notifications
You must be signed in to change notification settings - Fork 2
Dataset Concepts
note: this is more of a design doc that tracks the evolution of the system
New to URSYS in 2024 is the Dataset architecture for managing a set of related data collections under one address.
A Dataset consists of several DataBins that have a unique binID
name and a binType
. A client application can request a particular Dataset by specifying its dataURI
and authentication credentials. After the dataset request succeeds, the client can then perform CRUD, Search, and Query operations on any given DataBin by specifying its binID
and operation.
The API for Datasets is exposed through the modules sna-dataclient.ts and sna-dataserver.mts. Synchronization is maintained from the server's instance of the loaded dataset with the client's copy. From the client's perspective, data mutations use a write followed by notify pattern that is accessible through an URSYS EventMachine
pub/sub interface. Data reads, by comparison, use read from cached dataset which is guaranteed to be "up to date" at time of request.
source: sna-dataserver.mts
The dataserver is a SNA Component that receives URSYS SYNC
messages for "whole dataset" and "data CRUD+Query" operations from multiple dataclients on the web. The dataserver is the single source of truth for data; client-based data operations are synchronized to and from the dataserver in normal sync
mode.
- The
PreConfig
object must containruntime_dir
which is used to resolve the "file bucket" address where all runtime data is stored. - The dataclient must execute the SELECT DATASET command to load the data from permanent storage before any of the API methods work
-
SYNC:SRV_DSET
handles "whole dataset" operations declared inDatasetOp
on a provideddataURI
, which is sent from a dataclient's remote adapter. At the time of this writing, the operations areLOAD
,UNLOAD
,PERSIST
,GET_MANIFEST
andGET_DATA
-
SYNC:SRV_DATA
handles "databin CRUDQ" operations declared inDataSyncOp
on a provided optionaldataURI
; data operations default to the "selected dataset". Current operations areCLEAR
,GET
,ADD
,UPDATE
,WRITE
,DELETE
,REPLACE
,FIND
andQUERY
.
In addition to the message-based API, DataSet provides direct API methods. These are currently unused, but are provided for future server-side dataset access.
- LoadDataset()
- CloseDataset()
- PersistDataset()
- OpenBin()
- CloseBin()
The SYNC:SRV_DSET
and SYNC:SRV_DATA
protocols are designed to be independent of the filesystem, but dataserver is a default implementation that uses a default dataobject adapter to serialize/deserialize data to the filesystem.
source: sna-dataclient.ts
The dataclient is a SNA Component that sends URSYS SYNC
messages to select datasets to use and perform "data CRUD+QUERY" operations on the various databins in the dataset. It also receives update messages when the dataserver updates.
- The
PreConfig
object must have adataset
property containinguri
(providing a dataURI that is sent to the server) andmode
(default 'sync' to enable two-way synchronization)
The dataclient otherwise initializes itself through its built-in PreHook
declarations to call Configure()
and Activate()
during the application startup cycle, making use of the PreConfig
object parameters.
The SYNC:SRV_DSET
and SYNC:SRV_DATA
protocols for data operations will rely on an access token that is derived from the URSYS authentication token. Currently, though, this support is only stubbed-in and ignored by dataserver. The idea is that once a web app using dataclient is logged-in, the SELECT DATASET operation will negotiate the handshake.
There is no message API that is exposed for users, as the direct API is suitable. Behind the scenes, however, the dataclient sends SYNC:SRV_DSET
and SYNC:SRV_DATA
messages to talk to the dataserver and receives SYNC:CLI_DATA
messages to synchronize its dataset instance.
The following methods assume that the selected dataset was set through PreConfig
as dataURI
- Get(binID, ...)
- Add(binID, ...)
- Update(binID, ...)
- Write(binID, ...)
- Delete(binID, ...)
- DeleteIDs(binID, ...)
- Replace(binID, ...)
- Clear(binID, ...)
- Find(binID, ...)
- Query(binID, ...)
- Subscribe(evt, callback)
- Unsubscribe(evt, callback)
In the default case syncMode=='sync'
, these calls are routed to the dataserver but changes are not applied locally until the server sends a SYNC:CLI_DATA
message. As all web apps using the dataclient module implement this message, this ensure that everyone receives the same change.
In the future it will be possible to access read-only datasets by specifying syncMode=='sync-ro'
, but in the meantime it's possible to use DatasetAdapter.getDataObj(dataURI)
to request the dataset object associated with the dataURI, subject to (TBD) access control.
The DataClient also specifies two implemented methods:
-
async DS_RemoteFind(dataURI, binID, matchCrit?)
to find matching items in specified dataURI/binID -
async function DS_RemoteQuery( dataURI, binID, query?)
to query matching items and return them in a RecordSet.
The latter is intended to be used for accessing read-only shared data across the dataset server, subject to (TBD) access controls.
The SNA dataset client/server is a default implementation built on top of a protocol that assumes the following:
- a dataURI that uniquely identifies a set of named dataBins (collections), composed of several platform-independent serialized and persisted stored contents.
- a set of common CRUD+QUERY operations based on DataObjects with an
_id
field. - a message protocol that maps to the operations for managing datasets (DatasetOp) and the collections within (DataOp)
- a write through cache implementation that synchronizes the master dataset on the server with multiple client dataset mirrors that are updated behind-the-scenes
- a notification system through which clients can be aware of changes that were made behind-the-scenes
- an authentication and access token system to detemine access
- the
Dataset
class holds the collection - the
abstract-class-databin
class is the ancestor providing the common set of CRUD+QUERY operations across multiple collection types - the Dataset designated by a dataURI has an associated manifest object maps dataBins by name to their platform-dependent storage locations
While the default protocol is URSYS messaging, the actual implementation of the above protocol is up to the developer. In our case, sna-dataserver.mts
and sna-dataclient.ts
is our default implementation of the dataset concept, and we've approached it as follows:
- the dataset and data operations are conducted over URSYS with the
SYNC:SRV_DSET
,SYNC:SRV_DATA
, andSYNC:CLI_DATA
messages - the dataURI is mapped to a directory on the server's filesystem
- the dataset collections are serialized as JSON files in the dataURI's associated directory
These functions have been isolated into adapters:
- sna-dataclient makes use of an object using the interface IDS_DatasetAdapter which provides
selectDatabase()
,getDataObj()
andsyncData()
methods. The default implementation uses URSYS messaging and storesaccessToken
, sendingSYNC:SRV_DSET
,SYNC:SRV_DATA
, and receivingSYNC:CLI_DATA
. - sna-dataserver implements separate URSYS message handlers for
SYNC:SRV_DSET
andSYNC:SRV_DATA
, sendingSYNC:CLI_DATA
whenever the master Dataset instance is updated. - sna-dataserver implements the interface IDS_DataObjectAdapter which is the bridge between the filesystem and data objects that represent the persisted data of a dataset. The abstract base class
abstract-dataobj.adapter.ts
is extended by the default implementationsna-dataobj-adapter.mts
, providing methodsgetDatasetInfo()
,readDatasetObj()
,readDataBinObj()
,writeDatasetObj()
andwriteDataBinObj()
.
Note
The server-side message handler could be considered an "adapter", but as the entire sna-dataserver.mts
is a default implementation this handler isn't broken-out into its own thing like ProtocolAdapter
. The DataObjectAdapter, however, could be replaced in this dataserver implementation to use a different storage backend.
- uses SNA to build a project directory
- SNA.Build() scans for .mts files for server
- SNA.Build() scans for .ts files for web client
- loads SNA server module
- invokes SNA.Start()
- registers sna-dataserver SNA_MOD
- implements SNA_Start() export as Start()
- exports SNA_MOD PreHook for 'EXPRESS_READY'
- handles 'SYNC:SRV_DATA' and 'SYNC:SRV_DSET'
- receives dataURI from configured project info
- can load or generate manifest from dataURI
- can initialize dataset from manifest
- can manage multiple datasets loaded
- can load dataset object from disk
- can serialize dataset object to disk
- implements DATASET and DATA operations through protocol
- imports SNA client module
- prompts for login creds
- successfully authenticates websocket
- receives dataURI, authToken from appserver
- saves session dataURI, authToken, userID
- registers dc-comments as an SNA_MOD
- invokes SNA.Start() to kick stuff off
- registers sna-dataclient as an SNA_MOD
- hook DOM_READY for SNA_NetConnect()
- hook NET_CONNECT for hot module reloading
- implements SNA_Start() export as Start()
- exports SNA_MOD PreHook for 'NET_DATASET'
- can get session dataURI
- can submit session authToken to request dataset
- can determine syncmode from response
- can initialize dataset instance from getDataset()
- handle SYNC:CLI_DATA protocol conditionally syncmode
- provides means to select dataset by dataURI
- provides CRUDQ interface for dataset and its databins
- provides eventmachine notify pubsub
- exports SNA_MOD PreHook for 'LOAD_DATA'
- exports SNA_MOD PreConfig to receive GlobalConfig
- select dataURI and open 'comments' ItemList
- provide methods for performing comment data manipulation
- provide eventmachine notify pubsub