-
Notifications
You must be signed in to change notification settings - Fork 233
Data Snapshot
Snapshot is the process of persisting current data live store (in memory) to disk for fast dimension table recovery (as alternative to replay every events), and enable merging and purging redo-logs.
checkout here
Base on table level configurations, when the scheduler ticks it will check whether: number of mutations on live store is over a threshold, or a pre defined time interval passed. If either condition is satisfied for a dimension table, a snapshot will be created for that table.
Snapshot manager will record current live store status: redofile
, batch offset
, number of mutations
, last read record
, then start persisting live shards into disk, after which it will update live store and metastore with latest status.
When a table is bootstrapped, the recovery process will check with metastore on the latest snapshot info, and use latest available snapshot to fast rebuild table.