Skip to content

raw tables support (WIP) #198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from
6 changes: 5 additions & 1 deletion architecture/client-architecture.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -64,4 +64,8 @@ The client SDK maintains the following tables:

Most rows will be present in at least two tables — the `ps_data__<table>` table, and in `ps_oplog`. It may be present multiple times in `ps_oplog`, if it was synced via multiple buckets.

The copy in `ps_oplog` may be newer than the one in `ps_data__<table>`. Only when a full checkpoint has been downloaded, will the data be copied over to the individual tables. If multiple rows with the same table and id has been synced, only one will be preserved (the one with the highest `op_id`).
The copy in `ps_oplog` may be newer than the one in `ps_data__<table>`. Only when a full checkpoint has been downloaded, will the data be copied over to the individual tables. If multiple rows with the same table and id has been synced, only one will be preserved (the one with the highest `op_id`).

<Note>
If you run into limitations with the above JSON-based SQLite view system, check out [this experimental feature](/usage/use-case-examples/raw-tables) which allows you to define and manage raw SQLite tables to work around some limitations. We are actively seeking feedback about this functionality.
</Note>
1 change: 1 addition & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,7 @@
"usage/use-case-examples/offline-only-usage",
"usage/use-case-examples/postgis",
"usage/use-case-examples/prioritized-sync",
"usage/use-case-examples/raw-tables",
"usage/use-case-examples/custom-write-checkpoints"
]
},
Expand Down
3 changes: 1 addition & 2 deletions usage/use-case-examples.mdx
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
---
title: "Use Case Examples"
description: "Learn how to use PowerSync in common use cases"
mode: wide
sidebarTitle: Overview
---

The following examples are available to help you get started with specific use cases for PowerSync:
Expand All @@ -19,6 +17,7 @@ The following examples are available to help you get started with specific use c
<Card title="Local-only Usage" icon="laptop" href="/usage/use-case-examples/offline-only-usage" horizontal/>
<Card title="PostGIS" icon="map" href="/usage/use-case-examples/postgis" horizontal/>
<Card title="Prioritized Sync" icon="star" href="/usage/use-case-examples/prioritized-sync" horizontal/>
<Card title="Raw SQLite Tables" icon="table" href="/usage/use-case-examples/raw-tables" horizontal/>
</CardGroup>

## Additional Resources
Expand Down
290 changes: 290 additions & 0 deletions usage/use-case-examples/raw-tables.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,290 @@
---
title: "Raw SQLite Tables to bypass JSON View Limitations"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
title: "Raw SQLite Tables to bypass JSON View Limitations"
title: "Raw SQLite Tables to Bypass JSON View Limitations"

description: "Use raw tables for native SQLite functionality and improved performance."
sidebarTitle: "Raw SQLite Tables"
---

<Warning>
Raw tables are an experimental feature. We're actively seeking feedback on:

- API design and developer experience
- Additional features or optimizations needed

Join our [Discord community](https://discord.gg/powersync) to share your experience and get help.
</Warning>

By default, PowerSync uses a [JSON-based view system](/architecture/client-architecture#schema) where data is stored schemalessly in JSON format and then presented through SQLite views based on the client-side schema. Raw tables allow you to define native SQLite tables in the client-side schema, bypassing this.

Check warning on line 16 in usage/use-case-examples/raw-tables.mdx

View check run for this annotation

Mintlify / Mintlify Validation (powersync) - vale-spellcheck

usage/use-case-examples/raw-tables.mdx#L16

Did you really mean 'schemalessly'?

This eliminates overhead associated with extracting values from the JSON data and provides access to advanced SQLite features like foreign key constraints and custom indexes.

<Note>
**Availability**

Raw tables were introduced in the following versions of our client SDKs:
- __JavaScript__ (Node: `0.8.0`, React-Native: `1.23.0`, Web: `1.24.0`)
- __Dart__: Version 1.15.0 of `package:powersync`.
- __Kotlin__: Version 1.3.0

Also note that raw tables are only supported by the new [Rust-based sync client](https://releases.powersync.com/announcements/improved-sync-performance-in-our-client-sdks), which is currently opt-in.
</Note>

## When to Use Raw Tables

Consider raw tables when you need:

- **Advanced SQLite features** like `FOREIGN KEY` and `ON DELETE CASCADE` constraints
- **Indexes** - PowerSync's default schema has basic support for indexes on columns, while raw tables give you complete control to create indexes on expressions, use `GENERATED` columns, etc
- **Improved performance** for complex queries (e.g., `SELECT SUM(value) FROM transactions`) - raw tables more efficiently get these values directly from the SQLite column, instead of extracting the value from the JSON object on every row
- **Reduced storage overhead** - eliminate JSON object overhead for each row in `ps_data__<table>.data` column
- **To manually create tables** - Sometimes you need full control over table creation, for example when implementing custom triggers

## How Raw Tables Work

### Current JSON-Based System

Currently the sync system involves two general steps:

1. Download sync bucket operations from the PowerSync Service
2. Once the client has a complete checkpoint and no pending local changes in the upload queue, sync the local database with the bucket operations

The bucket operations use JSON to store the individual operation data. The local database uses tables with a simple schemaless `ps_data__<table_name>` structure containing only an `id` (TEXT) and `data` (JSON) column.

PowerSync automatically creates views on that table that extract JSON fields to resemble standard tables reflecting your schema.

### Raw Tables Approach

When opting in to raw tables, you are responsible for creating the tables before using them - PowerSync will no longer create them automatically.

Because PowerSync takes no control over raw tables, you need to manually:

1. Tell PowerSync how to map the [schemaless protocol](/architecture/powersync-protocol#protocol) to your raw tables when syncing data.
2. Configure custom triggers to forward local writes to PowerSync.

For the purpose of this example, consider a simple table like this:

```sql
CREATE TABLE todo_lists (
id TEXT NOT NULL PRIMARY KEY,
created_by TEXT NOT NULL,
title TEXT NOT NULL,
content TEXT
) STRICT;
```

#### Syncing into raw tables

To sync into the raw `todo_lists` table instead of `ps_data__`, PowerSync needs the SQL statements extracting
columns from the untyped JSON protocol used during syncing.

Check warning on line 77 in usage/use-case-examples/raw-tables.mdx

View check run for this annotation

Mintlify / Mintlify Validation (powersync) - vale-spellcheck

usage/use-case-examples/raw-tables.mdx#L77

Did you really mean 'untyped'?
This involves specifying two SQL statements:

1. A `put` SQL statement for upserts, responsible for creating a `todo_list` row or updating it based on its `id` and data columns.

Check warning on line 80 in usage/use-case-examples/raw-tables.mdx

View check run for this annotation

Mintlify / Mintlify Validation (powersync) - vale-spellcheck

usage/use-case-examples/raw-tables.mdx#L80

Did you really mean 'upserts'?
2. A `delete` SQL statement responsible for deletions.

The PowerSync client as part of our SDKs will automatically run these statements in response to sync lines being sent from the PowerSync Service.

To reference the ID or extract values, prepared statements with parameters are used. `delete` statements can reference the id of the affected row, while `put` statements can also reference individual column values.
Declaring these statements and parameters happens as part of the schema passed to PowerSync databases:

<Tabs>

<Tab title="JavaScript">
Raw tables are not included in the regular `Schema()` object. Instead, add them afterwards using `withRawTables`.
For each raw table, specify the `put` and `delete` statement. The values of parameters are described as a JSON
array either containing:

- the string `Id` to reference the id of the affected row.
- the object `{ Column: name }` to reference the value of the column `name`.

```JavaScript
const mySchema = new Schema({
// Define your PowerSync-managed schema here
// ...
});
mySchema.withRawTables({
// The name here doesn't have to match the name of the table in SQL. Instead, it's used to match
// the table name from the backend database as sent by the PowerSync service.
todo_lists: {
put: {
sql: 'INSERT OR REPLACE INTO todo_lists (id, created_by, title, content) VALUES (?, ?, ?, ?)',
params: ['Id', { Column: 'created_by' }, { Column: 'title' }, { Column: 'content' }]
},
delete: {
sql: 'DELETE FROM lists WHERE id = ?',
params: ['Id']
}
}
});
```

We will simplify this API after understanding the use-cases for raw tables better.
</Tab>

<Tab title="Dart">
Raw tables are not part of the regular tables list and can be defined with the optional `rawTables` parameter.

```dart
final schema = Schema(const [], rawTables: const [
RawTable(
// The name here doesn't have to match the name of the table in SQL. Instead, it's used to match
// the table name from the backend database as sent by the PowerSync service.
name: 'todo_lists',
put: PendingStatement(
sql: 'INSERT OR REPLACE INTO todo_lists (id, created_by, title, content) VALUES (?, ?, ?, ?)',
params: [
PendingStatementValue.id(),
PendingStatementValue.column('created_by'),
PendingStatementValue.column('title'),
PendingStatementValue.column('content'),
],
),
delete: PendingStatement(
sql: 'DELETE FROM todo_lists WHERE id = ?',
params: [
PendingStatementValue.id(),
],
),
),
]);
```
</Tab>

<Tab title="Kotlin">
To define a raw table, include it in the list of tables passed to the `Schema`:

```Kotlin
val schema = Schema(listOf(
RawTable(
// The name here doesn't have to match the name of the table in SQL. Instead, it's used to match
// the table name from the backend database as sent by the PowerSync service.
name = "todo_lists",
put = PendingStatement(
"INSERT OR REPLACE INTO todo_lists (id, created_by, title, content) VALUES (?, ?, ?, ?)",
listOf(
PendingStatementParameter.Id,
PendingStatementParameter.Column("created_by"),
PendingStatementParameter.Column("title"),
PendingStatementParameter.Column("content")
)
),
delete = PendingStatement(
"DELETE FROM todo_lists WHERE id = ?", listOf(PendingStatementParameter.Id)
)
)
))
```
</Tab>

<Tab title="Swift">
Unfortunately, raw tables are not available in the Swift SDK yet.
</Tab>

<Tab title=".NET">
Unfortunately, raw tables are not available in the .NET SDK yet.
</Tab>

</Tabs>

After adding raw tables to the schema, you're also responsible for creating them by executing the
corresponding `CREATE TABLE` statement before `connect()`-ing the database.

#### Collecting local writes on raw tables

PowerSync uses an internal SQLite table to collect local writes. For PowerSync-managed views, a trigger for
insertions, updates and deletions automatically forwards local mutations into this table.
When using raw tables, defining those triggers is your responsibility.

The [PowerSync SQLite extension](https://github.com/powersync-ja/powersync-sqlite-core) creates an insert-only virtual table named `powersync_crud` with these columns:

```SQL
CREATE VIRTUAL TABLE powersync_crud(
-- The type of operation: 'PUT' or 'DELETE'
op TEXT,
-- The id of the affected row
id TEXT,
type TEXT,
-- optional (not set on deletes): The column values for the row
data TEXT,
-- optional: Previous column values to include in a CRUD entry
old_values TEXT,
-- optional: Metadata for the write to include in a CRUD entry
metadata TEXT,
);
```

The virtual table associates local mutations with the current transaction and ensures writes made during the sync
process (applying server-side changes) don't count as local writes.
This means that triggers can be defined on raw tables like so:

```SQL
CREATE TRIGGER todo_lists_insert
AFTER INSERT ON todo_lists
FOR EACH ROW
BEGIN
INSERT INTO powersync_crud (op, id, type, data) VALUES ('PUT', NEW.id, 'todo_lists', json_object(
'created_by', NEW.created_by,
'title', NEW.title,
'content', NEW.content
));
END;

CREATE TRIGGER todo_lists_update
AFTER INSERT ON todo_lists
FOR EACH ROW
BEGIN
INSERT INTO powersync_crud (op, id, type, data) VALUES ('PUT', NEW.id, 'todo_lists', json_object(
'created_by', NEW.created_by,
'title', NEW.title,
'content', NEW.content
));
END;

CREATE TRIGGER todo_lists_delete
AFTER DELETE ON todo_lists
FOR EACH ROW
BEGIN
INSERT INTO powersync_crud (op, id, type) VALUES ('DELETE', OLD.id, 'todo_lists');
END;
```
## Migrations

In PowerSync's [JSON-based view system](/architecture/client-architecture#schema) the client-side schema is applied to the schemaless data, meaning no migrations are required. Raw tables however are excluded from this, so it is the developers responsibility to manage migrations for these tables.

### Adding raw tables as a new table

When you're adding new tables to your sync rules, clients will start to sync data on those tables - even if the tables aren't mentioned
in the client's schema yet.
So at the time you're introducing a new raw table to your app, it's possible that PowerSync has already synced some data for that
table, which would be stored in `ps_untyped`. When adding regular tables, PowerSync will automatically extract rows from `ps_untyped`.
With raw tables, that step is your responsibility. To copy data, run these statements in a transaction after creating the table:

```
INSERT INTO my_table (id, my_column, ...)
SELECT id, data ->> 'my_column' FROM ps_untyped WHERE type = 'my_table';
DELETE FROM ps_untyped WHERE type = 'my_table';
```

This does not apply if you've been using the raw table from the beginning (and never called `connect()` without them) - you only
need this for raw tables you already had locally.

Another workaround is to clear PowerSync data when changing raw tables and opt for a full resync.

Check warning on line 269 in usage/use-case-examples/raw-tables.mdx

View check run for this annotation

Mintlify / Mintlify Validation (powersync) - vale-spellcheck

usage/use-case-examples/raw-tables.mdx#L269

Did you really mean 'resync'?

### Migrating to raw tables

To migrate from PowerSync-managed tables to raw tables, first:

1. Open the database with the new schema mentioning raw tables. PowerSync will copy data from tables previously managed by PowerSync into
`ps_untyped`.
2. Create raw tables.
3. Run the `INSERT FROM SELECT` statement to insert `ps_untyped` data into your raw tables.

### Migrations on raw tables

When adding new columns to raw tables, there currently isn't a way to re-sync that table to add those columns from the server - we are
investigating possible workarounds and encourage users to try out if they need this.

To ensure the column values are accurate, you'd have to delete all data after a migration and wait for the next complete sync.

## Deleting data and raw tables

APIs that clear an entire PowerSync database, like e.g. `disconnectAndClear()`, don't affect raw tables.
This should be kept in mind when you're using those methods - data from raw tables needs to be deleted explicitly.