-
Notifications
You must be signed in to change notification settings - Fork 7
raw tables support (WIP) #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
benitav
wants to merge
14
commits into
main
Choose a base branch
from
raw-tables
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
a873c62
init
benitav 7bf7e2d
Code snippets
simolus3 7c901fc
Fix JS typo
simolus3 28749b0
More typos
simolus3 655af55
Make it clearer that powersync_crud is virtual
simolus3 ca4fa23
polish
benitav 9c726b9
polish
benitav 43a643d
mention raw tables in client architecture
benitav cae0593
more fitting icon
benitav 6660882
Consolidate message
benitav b4556bc
additional polish
benitav 125a75b
Mention release versions for JS
simolus3 0c0127b
More notes on migrations
simolus3 43d4b88
Add minimum versions for Dart and Kotlin
simolus3 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,290 @@ | ||
--- | ||
title: "Raw SQLite Tables to bypass JSON View Limitations" | ||
description: "Use raw tables for native SQLite functionality and improved performance." | ||
sidebarTitle: "Raw SQLite Tables" | ||
--- | ||
|
||
<Warning> | ||
Raw tables are an experimental feature. We're actively seeking feedback on: | ||
|
||
- API design and developer experience | ||
- Additional features or optimizations needed | ||
|
||
Join our [Discord community](https://discord.gg/powersync) to share your experience and get help. | ||
</Warning> | ||
|
||
By default, PowerSync uses a [JSON-based view system](/architecture/client-architecture#schema) where data is stored schemalessly in JSON format and then presented through SQLite views based on the client-side schema. Raw tables allow you to define native SQLite tables in the client-side schema, bypassing this. | ||
|
||
This eliminates overhead associated with extracting values from the JSON data and provides access to advanced SQLite features like foreign key constraints and custom indexes. | ||
|
||
<Note> | ||
**Availability** | ||
|
||
Raw tables were introduced in the following versions of our client SDKs: | ||
- __JavaScript__ (Node: `0.8.0`, React-Native: `1.23.0`, Web: `1.24.0`) | ||
- __Dart__: Version 1.15.0 of `package:powersync`. | ||
- __Kotlin__: Version 1.3.0 | ||
|
||
Also note that raw tables are only supported by the new [Rust-based sync client](https://releases.powersync.com/announcements/improved-sync-performance-in-our-client-sdks), which is currently opt-in. | ||
</Note> | ||
|
||
## When to Use Raw Tables | ||
|
||
Consider raw tables when you need: | ||
|
||
- **Advanced SQLite features** like `FOREIGN KEY` and `ON DELETE CASCADE` constraints | ||
- **Indexes** - PowerSync's default schema has basic support for indexes on columns, while raw tables give you complete control to create indexes on expressions, use `GENERATED` columns, etc | ||
- **Improved performance** for complex queries (e.g., `SELECT SUM(value) FROM transactions`) - raw tables more efficiently get these values directly from the SQLite column, instead of extracting the value from the JSON object on every row | ||
- **Reduced storage overhead** - eliminate JSON object overhead for each row in `ps_data__<table>.data` column | ||
- **To manually create tables** - Sometimes you need full control over table creation, for example when implementing custom triggers | ||
|
||
## How Raw Tables Work | ||
|
||
### Current JSON-Based System | ||
|
||
Currently the sync system involves two general steps: | ||
|
||
1. Download sync bucket operations from the PowerSync Service | ||
2. Once the client has a complete checkpoint and no pending local changes in the upload queue, sync the local database with the bucket operations | ||
|
||
The bucket operations use JSON to store the individual operation data. The local database uses tables with a simple schemaless `ps_data__<table_name>` structure containing only an `id` (TEXT) and `data` (JSON) column. | ||
|
||
PowerSync automatically creates views on that table that extract JSON fields to resemble standard tables reflecting your schema. | ||
|
||
### Raw Tables Approach | ||
|
||
When opting in to raw tables, you are responsible for creating the tables before using them - PowerSync will no longer create them automatically. | ||
|
||
Because PowerSync takes no control over raw tables, you need to manually: | ||
|
||
1. Tell PowerSync how to map the [schemaless protocol](/architecture/powersync-protocol#protocol) to your raw tables when syncing data. | ||
2. Configure custom triggers to forward local writes to PowerSync. | ||
|
||
For the purpose of this example, consider a simple table like this: | ||
|
||
```sql | ||
CREATE TABLE todo_lists ( | ||
id TEXT NOT NULL PRIMARY KEY, | ||
created_by TEXT NOT NULL, | ||
title TEXT NOT NULL, | ||
content TEXT | ||
) STRICT; | ||
``` | ||
|
||
#### Syncing into raw tables | ||
|
||
To sync into the raw `todo_lists` table instead of `ps_data__`, PowerSync needs the SQL statements extracting | ||
columns from the untyped JSON protocol used during syncing. | ||
This involves specifying two SQL statements: | ||
|
||
1. A `put` SQL statement for upserts, responsible for creating a `todo_list` row or updating it based on its `id` and data columns. | ||
2. A `delete` SQL statement responsible for deletions. | ||
|
||
The PowerSync client as part of our SDKs will automatically run these statements in response to sync lines being sent from the PowerSync Service. | ||
|
||
To reference the ID or extract values, prepared statements with parameters are used. `delete` statements can reference the id of the affected row, while `put` statements can also reference individual column values. | ||
Declaring these statements and parameters happens as part of the schema passed to PowerSync databases: | ||
|
||
<Tabs> | ||
|
||
<Tab title="JavaScript"> | ||
Raw tables are not included in the regular `Schema()` object. Instead, add them afterwards using `withRawTables`. | ||
For each raw table, specify the `put` and `delete` statement. The values of parameters are described as a JSON | ||
array either containing: | ||
|
||
- the string `Id` to reference the id of the affected row. | ||
- the object `{ Column: name }` to reference the value of the column `name`. | ||
|
||
```JavaScript | ||
const mySchema = new Schema({ | ||
// Define your PowerSync-managed schema here | ||
// ... | ||
}); | ||
mySchema.withRawTables({ | ||
// The name here doesn't have to match the name of the table in SQL. Instead, it's used to match | ||
// the table name from the backend database as sent by the PowerSync service. | ||
todo_lists: { | ||
put: { | ||
sql: 'INSERT OR REPLACE INTO todo_lists (id, created_by, title, content) VALUES (?, ?, ?, ?)', | ||
params: ['Id', { Column: 'created_by' }, { Column: 'title' }, { Column: 'content' }] | ||
}, | ||
delete: { | ||
sql: 'DELETE FROM lists WHERE id = ?', | ||
params: ['Id'] | ||
} | ||
} | ||
}); | ||
``` | ||
|
||
We will simplify this API after understanding the use-cases for raw tables better. | ||
</Tab> | ||
|
||
<Tab title="Dart"> | ||
Raw tables are not part of the regular tables list and can be defined with the optional `rawTables` parameter. | ||
|
||
```dart | ||
final schema = Schema(const [], rawTables: const [ | ||
RawTable( | ||
// The name here doesn't have to match the name of the table in SQL. Instead, it's used to match | ||
// the table name from the backend database as sent by the PowerSync service. | ||
name: 'todo_lists', | ||
put: PendingStatement( | ||
sql: 'INSERT OR REPLACE INTO todo_lists (id, created_by, title, content) VALUES (?, ?, ?, ?)', | ||
params: [ | ||
PendingStatementValue.id(), | ||
PendingStatementValue.column('created_by'), | ||
PendingStatementValue.column('title'), | ||
PendingStatementValue.column('content'), | ||
], | ||
), | ||
delete: PendingStatement( | ||
sql: 'DELETE FROM todo_lists WHERE id = ?', | ||
params: [ | ||
PendingStatementValue.id(), | ||
], | ||
), | ||
), | ||
]); | ||
``` | ||
</Tab> | ||
|
||
<Tab title="Kotlin"> | ||
To define a raw table, include it in the list of tables passed to the `Schema`: | ||
|
||
```Kotlin | ||
val schema = Schema(listOf( | ||
RawTable( | ||
// The name here doesn't have to match the name of the table in SQL. Instead, it's used to match | ||
// the table name from the backend database as sent by the PowerSync service. | ||
name = "todo_lists", | ||
put = PendingStatement( | ||
"INSERT OR REPLACE INTO todo_lists (id, created_by, title, content) VALUES (?, ?, ?, ?)", | ||
listOf( | ||
PendingStatementParameter.Id, | ||
PendingStatementParameter.Column("created_by"), | ||
PendingStatementParameter.Column("title"), | ||
PendingStatementParameter.Column("content") | ||
) | ||
), | ||
delete = PendingStatement( | ||
"DELETE FROM todo_lists WHERE id = ?", listOf(PendingStatementParameter.Id) | ||
) | ||
) | ||
)) | ||
``` | ||
</Tab> | ||
|
||
<Tab title="Swift"> | ||
Unfortunately, raw tables are not available in the Swift SDK yet. | ||
</Tab> | ||
|
||
<Tab title=".NET"> | ||
Unfortunately, raw tables are not available in the .NET SDK yet. | ||
</Tab> | ||
|
||
</Tabs> | ||
|
||
After adding raw tables to the schema, you're also responsible for creating them by executing the | ||
corresponding `CREATE TABLE` statement before `connect()`-ing the database. | ||
|
||
#### Collecting local writes on raw tables | ||
|
||
PowerSync uses an internal SQLite table to collect local writes. For PowerSync-managed views, a trigger for | ||
insertions, updates and deletions automatically forwards local mutations into this table. | ||
When using raw tables, defining those triggers is your responsibility. | ||
|
||
The [PowerSync SQLite extension](https://github.com/powersync-ja/powersync-sqlite-core) creates an insert-only virtual table named `powersync_crud` with these columns: | ||
|
||
```SQL | ||
CREATE VIRTUAL TABLE powersync_crud( | ||
-- The type of operation: 'PUT' or 'DELETE' | ||
op TEXT, | ||
-- The id of the affected row | ||
id TEXT, | ||
type TEXT, | ||
-- optional (not set on deletes): The column values for the row | ||
data TEXT, | ||
-- optional: Previous column values to include in a CRUD entry | ||
old_values TEXT, | ||
-- optional: Metadata for the write to include in a CRUD entry | ||
metadata TEXT, | ||
); | ||
``` | ||
|
||
The virtual table associates local mutations with the current transaction and ensures writes made during the sync | ||
process (applying server-side changes) don't count as local writes. | ||
This means that triggers can be defined on raw tables like so: | ||
|
||
```SQL | ||
CREATE TRIGGER todo_lists_insert | ||
AFTER INSERT ON todo_lists | ||
FOR EACH ROW | ||
BEGIN | ||
INSERT INTO powersync_crud (op, id, type, data) VALUES ('PUT', NEW.id, 'todo_lists', json_object( | ||
'created_by', NEW.created_by, | ||
'title', NEW.title, | ||
'content', NEW.content | ||
)); | ||
END; | ||
|
||
CREATE TRIGGER todo_lists_update | ||
AFTER INSERT ON todo_lists | ||
FOR EACH ROW | ||
BEGIN | ||
INSERT INTO powersync_crud (op, id, type, data) VALUES ('PUT', NEW.id, 'todo_lists', json_object( | ||
'created_by', NEW.created_by, | ||
'title', NEW.title, | ||
'content', NEW.content | ||
)); | ||
END; | ||
|
||
CREATE TRIGGER todo_lists_delete | ||
AFTER DELETE ON todo_lists | ||
FOR EACH ROW | ||
BEGIN | ||
INSERT INTO powersync_crud (op, id, type) VALUES ('DELETE', OLD.id, 'todo_lists'); | ||
END; | ||
``` | ||
## Migrations | ||
|
||
In PowerSync's [JSON-based view system](/architecture/client-architecture#schema) the client-side schema is applied to the schemaless data, meaning no migrations are required. Raw tables however are excluded from this, so it is the developers responsibility to manage migrations for these tables. | ||
|
||
### Adding raw tables as a new table | ||
|
||
When you're adding new tables to your sync rules, clients will start to sync data on those tables - even if the tables aren't mentioned | ||
in the client's schema yet. | ||
So at the time you're introducing a new raw table to your app, it's possible that PowerSync has already synced some data for that | ||
table, which would be stored in `ps_untyped`. When adding regular tables, PowerSync will automatically extract rows from `ps_untyped`. | ||
With raw tables, that step is your responsibility. To copy data, run these statements in a transaction after creating the table: | ||
|
||
``` | ||
INSERT INTO my_table (id, my_column, ...) | ||
SELECT id, data ->> 'my_column' FROM ps_untyped WHERE type = 'my_table'; | ||
DELETE FROM ps_untyped WHERE type = 'my_table'; | ||
``` | ||
|
||
This does not apply if you've been using the raw table from the beginning (and never called `connect()` without them) - you only | ||
need this for raw tables you already had locally. | ||
|
||
Another workaround is to clear PowerSync data when changing raw tables and opt for a full resync. | ||
|
||
### Migrating to raw tables | ||
|
||
To migrate from PowerSync-managed tables to raw tables, first: | ||
|
||
1. Open the database with the new schema mentioning raw tables. PowerSync will copy data from tables previously managed by PowerSync into | ||
`ps_untyped`. | ||
2. Create raw tables. | ||
3. Run the `INSERT FROM SELECT` statement to insert `ps_untyped` data into your raw tables. | ||
|
||
### Migrations on raw tables | ||
|
||
When adding new columns to raw tables, there currently isn't a way to re-sync that table to add those columns from the server - we are | ||
investigating possible workarounds and encourage users to try out if they need this. | ||
|
||
To ensure the column values are accurate, you'd have to delete all data after a migration and wait for the next complete sync. | ||
|
||
## Deleting data and raw tables | ||
|
||
APIs that clear an entire PowerSync database, like e.g. `disconnectAndClear()`, don't affect raw tables. | ||
This should be kept in mind when you're using those methods - data from raw tables needs to be deleted explicitly. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.