Skip to content

Commit 3e37243

Browse files
committed
add datashare support to template repo
1 parent d5eddfe commit 3e37243

6 files changed

Lines changed: 365 additions & 0 deletions

File tree

README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ When you're ready to enable automated dbt runs on PRs, pushes to main, or a sche
2525
- **[Getting Started](docs/getting-started.md)** - Initial setup for new developers
2626
- **[Development Workflow](docs/development-workflow.md)** - How to develop models
2727
- **[dbt Best Practices](docs/dbt-best-practices.md)** - Patterns and configurations
28+
- **[Dune Datashares](docs/dune-datashares.md)** - Sync tables to external warehouses
2829
- **[Testing](docs/testing.md)** - Test requirements
2930
- **[CI/CD](docs/cicd.md)** - GitHub Actions workflows
3031
- **[Troubleshooting](docs/troubleshooting.md)** - Common issues
@@ -149,9 +150,16 @@ select * from dune.dune__tmp_.dbt_template_view_model
149150
| Incremental (Merge) | `dbt_template_merge_incremental_model.sql` | Efficient updates via merge |
150151
| Incremental (Delete+Insert) | `dbt_template_delete_insert_incremental_model.sql` | Efficient updates via delete+insert |
151152
| Incremental (Append) | `dbt_template_append_incremental_model.sql` | Append-only with deduplication |
153+
| Incremental (Datashare) | `dbt_template_datashare_incremental_model.sql` | Merge model with datashare sync |
152154

153155
All templates are in `models/templates/`.
154156

157+
## Datashares
158+
159+
This template includes an opt-in datashare post-hook for `table` and `incremental` models. To enable it on a model, set `meta.datashare.enabled: true` and provide the sync window fields in the model config.
160+
161+
See [docs/dune-datashares.md](docs/dune-datashares.md) for the full setup, `run-operation` examples, monitoring queries, and cleanup commands.
162+
155163
## Table Visibility
156164

157165
By default, all tables are **private** — only your team can query them. To make a table publicly accessible (visible and queryable by anyone on Dune), set `meta.dune.public: true` in the model config:

dbt_project.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,3 +40,5 @@ models:
4040
transaction: true
4141
- sql: "{{ vacuum_table(this, model.config.materialized) }}"
4242
transaction: true
43+
- sql: "{{ datashare_trigger_sync() }}"
44+
transaction: true

docs/dune-datashares.md

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
# Dune Datashares
2+
3+
Datashares sync Dune tables to external data warehouses such as Snowflake and BigQuery so downstream consumers can query the data outside Dune.
4+
5+
## Prerequisites
6+
7+
Datashare is an enterprise feature that requires setup before any SQL statements will work:
8+
9+
1. Contract and feature enablement with Dune.
10+
2. Target warehouse configuration in Dune backoffice.
11+
3. A Dune API key with Data Transformations access.
12+
13+
If datashare is not enabled for your team, the SQL statements below will fail with an authorization error.
14+
15+
## What This Template Includes
16+
17+
This template ships with datashare support already wired in:
18+
19+
- `macros/dune_dbt_overrides/datashare_table_sync_post_hook.sql`
20+
- a global post-hook in `dbt_project.yml` that calls `datashare_trigger_sync()`
21+
- an opt-in example model at `models/templates/dbt_template_datashare_incremental_model.sql`
22+
23+
Models without `meta.datashare` are unchanged. The hook skips them.
24+
25+
The built-in post-hook only executes on the `prod` target, so local `dev` runs and CI temp schemas do not create datashare syncs by default.
26+
27+
## Supported Models
28+
29+
Datashare sync is only applied to `table` and `incremental` models.
30+
31+
Views are skipped.
32+
33+
## Enable Datashare On A Model
34+
35+
Add `meta.datashare` to a `table` or `incremental` model:
36+
37+
```sql
38+
{% set time_start = "current_date - interval '1' day" if is_incremental() else "current_date - interval '2' day" %}
39+
{% set time_end = "current_date + interval '1' day" %}
40+
41+
{{ config(
42+
alias = 'my_datashared_model'
43+
, materialized = 'incremental'
44+
, incremental_strategy = 'merge'
45+
, unique_key = ['block_number', 'block_date']
46+
, meta = {
47+
"datashare": {
48+
"enabled": true,
49+
"time_column": "block_date",
50+
"time_start": time_start,
51+
"time_end": time_end
52+
}
53+
}
54+
) }}
55+
56+
select ...
57+
```
58+
59+
The included example model in this repo follows this pattern.
60+
61+
## Configuration Reference
62+
63+
All datashare config lives under `meta.datashare` in the model `config()` block.
64+
65+
| Property | Required | Type | Description |
66+
| --- | --- | --- | --- |
67+
| `enabled` | Yes | `boolean` | Must be `true` to trigger sync. |
68+
| `time_column` | Yes | `string` | Column used to define the sync window. |
69+
| `time_start` | Yes | `string` | SQL expression for the start of the sync window. |
70+
| `time_end` | No | `string` | SQL expression for the end of the sync window. Defaults to `now()`. |
71+
| `unique_key_columns` | No | `list[string]` | Row identity columns. Falls back to the model `unique_key` if omitted. |
72+
73+
`time_start` and `time_end` are SQL expressions, not literal timestamps. The macro wraps them in `CAST(... AS VARCHAR)` before calling the table procedure.
74+
75+
Keep the sync window aligned with the `time_column` granularity. For example, if `time_column` is a `date`, use date-based expressions like `current_date - interval '1' day`, not hour-based timestamp windows.
76+
77+
## Full Refresh Behavior
78+
79+
The macro determines `full_refresh` automatically:
80+
81+
| Context | `full_refresh` |
82+
| --- | --- |
83+
| Incremental post-hook on a normal incremental run | `false` |
84+
| Incremental post-hook on first run or `--full-refresh` | `true` |
85+
| Table materialization post-hook | `true` |
86+
| `run-operation` | `false` unless overridden |
87+
88+
## Generated SQL
89+
90+
The post-hook generates this Trino statement:
91+
92+
```sql
93+
ALTER TABLE dune.<schema>.<table> EXECUTE datashare(
94+
time_column => '<column_name>',
95+
unique_key_columns => ARRAY['col1', 'col2'],
96+
time_start => CAST(<sql_expression> AS VARCHAR),
97+
time_end => CAST(<sql_expression> AS VARCHAR),
98+
full_refresh => true|false
99+
)
100+
```
101+
102+
## Manual Syncs
103+
104+
Use `run-operation` when you want to trigger a sync outside `dbt run`.
105+
106+
Preview the generated SQL only:
107+
108+
```bash
109+
uv run dbt run-operation datashare_trigger_sync_operation --args '
110+
model_selector: dbt_template_datashare_incremental_model
111+
dry_run: true
112+
'
113+
```
114+
115+
Execute a sync:
116+
117+
```bash
118+
uv run dbt run-operation datashare_trigger_sync_operation --args '
119+
model_selector: dbt_template_datashare_incremental_model
120+
time_start: "current_date - interval '\''7'\'' day"
121+
time_end: "current_date + interval '\''1'\'' day"
122+
'
123+
```
124+
125+
Force a full refresh sync:
126+
127+
```bash
128+
uv run dbt run-operation datashare_trigger_sync_operation --args '
129+
model_selector: dbt_template_datashare_incremental_model
130+
full_refresh: true
131+
'
132+
```
133+
134+
`model_selector` accepts the model name, alias, fully qualified name, or dbt `unique_id`.
135+
136+
## Monitoring
137+
138+
Check the datashare system tables after a run:
139+
140+
```sql
141+
SELECT *
142+
FROM dune.datashare.table_syncs
143+
WHERE source_schema = '<your_schema>';
144+
145+
SELECT *
146+
FROM dune.datashare.table_sync_runs
147+
WHERE source_schema = '<your_schema>'
148+
ORDER BY created_at DESC;
149+
```
150+
151+
`table_syncs` shows the registered share and its latest status.
152+
153+
`table_sync_runs` shows individual sync attempts, including the time window and whether the run was a full refresh.
154+
155+
## Cleanup
156+
157+
Remove a table from datashare with:
158+
159+
```sql
160+
ALTER TABLE dune.<schema>.<table> EXECUTE delete_datashare
161+
```
162+
163+
## Example Workflow
164+
165+
1. Configure a model with `meta.datashare`.
166+
2. Run it with `uv run dbt run --select my_model --target prod`.
167+
3. Confirm the datashare registration in `dune.datashare.table_syncs`.
168+
4. Inspect run history in `dune.datashare.table_sync_runs`.
169+
170+
## Further Reading
171+
172+
- [Supported SQL Operations](https://docs.dune.com/api-reference/connectors/sql-operations)
173+
- [dbt connector overview](https://docs.dune.com/api-reference/connectors/dbt/overview)
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
{% macro _datashare_sql_string(value) %}
2+
{{ return("'" ~ (value | string | replace("'", "''")) ~ "'") }}
3+
{%- endmacro -%}
4+
5+
{% macro _datashare_unique_key_columns_sql(unique_key_columns) %}
6+
{%- if unique_key_columns is string -%}
7+
{%- set unique_key_columns = [unique_key_columns] -%}
8+
{%- elif unique_key_columns is not iterable or unique_key_columns is mapping -%}
9+
{{ return("CAST(ARRAY[] AS ARRAY(VARCHAR))") }}
10+
{%- endif -%}
11+
{%- set quoted = [] -%}
12+
{%- for col in unique_key_columns -%}
13+
{%- do quoted.append(_datashare_sql_string(col)) -%}
14+
{%- endfor -%}
15+
{{ return("CAST(ARRAY[] AS ARRAY(VARCHAR))" if quoted | length == 0 else "ARRAY[" ~ quoted | join(', ') ~ "]") }}
16+
{%- endmacro -%}
17+
18+
{% macro _datashare_optional_time_sql(value) %}
19+
{{ return('NULL' if value is none else 'CAST(' ~ value ~ ' AS VARCHAR)') }}
20+
{%- endmacro -%}
21+
22+
{#
23+
Datashare sync macro - generates ALTER TABLE ... EXECUTE datashare() SQL.
24+
Config reference and usage: docs/dune-datashares.md
25+
#}
26+
{% macro _datashare_table_sync_sql(
27+
schema_name
28+
, table_name
29+
, meta
30+
, materialized
31+
, unique_key=None
32+
, time_start=None
33+
, time_end=None
34+
, full_refresh=False
35+
, catalog_name=target.database
36+
) %}
37+
{%- set model_ref = schema_name ~ '.' ~ table_name -%}
38+
{%- if meta is not mapping or meta.get('datashare') is none or meta.get('datashare') is not mapping -%}
39+
{{ log('Skipping datashare sync for ' ~ model_ref ~ ': meta.datashare is not configured.', info=True) }}
40+
{{ return(none) }}
41+
{%- endif -%}
42+
{%- set datashare = meta.get('datashare') -%}
43+
{%- if datashare.get('enabled') is not sameas true -%}
44+
{{ log('Skipping datashare sync for ' ~ model_ref ~ ': meta.datashare.enabled is not true.', info=True) }}
45+
{{ return(none) }}
46+
{%- endif -%}
47+
{%- if materialized not in ['incremental', 'table'] -%}
48+
{{ log('Skipping datashare sync for ' ~ model_ref ~ ': materialization "' ~ materialized ~ '" is not incremental/table.') }}
49+
{{ return(none) }}
50+
{%- endif -%}
51+
{%- set time_column = datashare.get('time_column') -%}
52+
{%- set resolved_time_start = time_start if time_start is not none else datashare.get('time_start') -%}
53+
{%- set resolved_time_end = time_end if time_end is not none else datashare.get('time_end', 'now()') -%}
54+
55+
{%- set sql -%}
56+
ALTER TABLE {{ catalog_name }}.{{ schema_name }}.{{ table_name }} EXECUTE datashare(
57+
time_column => {{ _datashare_sql_string(time_column | default('', true)) }},
58+
unique_key_columns => {{ _datashare_unique_key_columns_sql(datashare.get('unique_key_columns', unique_key)) }},
59+
time_start => {{ _datashare_optional_time_sql(resolved_time_start) }},
60+
time_end => {{ _datashare_optional_time_sql(resolved_time_end) }},
61+
full_refresh => {{ 'true' if full_refresh else 'false' }}
62+
)
63+
{%- endset -%}
64+
{{ log('datashare sync preview for ' ~ model_ref ~ ':\n' ~ sql, info=True) }}
65+
{{ return(sql) }}
66+
{%- endmacro -%}
67+
68+
{% macro datashare_trigger_sync() %}
69+
{%- if target.name != 'prod' -%}
70+
{{ log('Skipping datashare sync for ' ~ this.schema ~ '.' ~ this.identifier ~ ': datashare post-hook only runs on the prod target.', info=True) }}
71+
{{ return('') }}
72+
{%- endif -%}
73+
{{ return(_datashare_table_sync_sql(
74+
schema_name=this.schema,
75+
table_name=this.identifier,
76+
meta=model.config.get('meta', {}),
77+
materialized=model.config.materialized,
78+
unique_key=model.config.get('unique_key'),
79+
full_refresh=(not is_incremental())
80+
) or '') }}
81+
{%- endmacro -%}
82+
83+
{% macro _datashare_resolve_model_node(model_selector) %}
84+
{%- set matches = [] -%}
85+
{%- for node in graph.nodes.values() -%}
86+
{%- if node.resource_type == 'model' -%}
87+
{%- set fqn_name = node.fqn | join('.') -%}
88+
{%- if node.unique_id == model_selector or node.name == model_selector or node.alias == model_selector or fqn_name == model_selector -%}
89+
{%- do matches.append(node) -%}
90+
{%- endif -%}
91+
{%- endif -%}
92+
{%- endfor -%}
93+
94+
{%- if matches | length == 0 -%}
95+
{{ exceptions.raise_compiler_error("No model found for selector '" ~ model_selector ~ "'. Use model name, alias, fqn, or unique_id.") }}
96+
{%- endif -%}
97+
98+
{%- if matches | length > 1 -%}
99+
{{ exceptions.raise_compiler_error("Model selector '" ~ model_selector ~ "' is ambiguous. Matches: " ~ (matches | map(attribute='unique_id') | join(', '))) }}
100+
{%- endif -%}
101+
102+
{{ return(matches[0]) }}
103+
{%- endmacro -%}
104+
105+
{% macro datashare_trigger_sync_operation(model_selector, time_start=None, time_end=None, dry_run=False, full_refresh=False) %}
106+
{%- set node = _datashare_resolve_model_node(model_selector) -%}
107+
{%- set node_config = node.config if node.config is mapping else {} -%}
108+
{%- set materialized = node_config.get('materialized', 'view') -%}
109+
{%- set table_name = node.alias if node.alias is not none else node.name -%}
110+
{%- set is_full_refresh = materialized == 'table' or full_refresh is sameas true -%}
111+
112+
{%- set sql = _datashare_table_sync_sql(
113+
schema_name=node.schema,
114+
table_name=table_name,
115+
meta=node_config.get('meta', {}),
116+
materialized=materialized,
117+
unique_key=node_config.get('unique_key'),
118+
time_start=time_start,
119+
time_end=time_end,
120+
full_refresh=is_full_refresh,
121+
catalog_name=node.database or target.database
122+
) -%}
123+
124+
{%- if sql is none -%}
125+
{{ exceptions.raise_compiler_error("Cannot sync " ~ node.schema ~ "." ~ table_name ~ ": model must be incremental or table with meta.datashare.enabled = true.") }}
126+
{%- endif -%}
127+
128+
{%- set is_dry_run = dry_run is sameas true or (dry_run is string and dry_run | lower in ['true', '1', 'yes', 'y']) -%}
129+
{%- if not is_dry_run -%}
130+
{% do run_query(sql) %}
131+
{{ log('Executed datashare sync for selector ' ~ model_selector, info=True) }}
132+
{%- endif -%}
133+
{{ return(sql) }}
134+
{%- endmacro -%}

models/templates/_schema.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,3 +64,18 @@ models:
6464
description: "The date of the block"
6565
- name: total_tx_per_block
6666
description: "The total number of transactions per block"
67+
- name: dbt_template_datashare_incremental_model
68+
description: "A starter dbt incremental model using merge strategy with datashare sync enabled"
69+
data_tests:
70+
- dbt_utils.unique_combination_of_columns:
71+
arguments:
72+
combination_of_columns:
73+
- block_number
74+
- block_date
75+
columns:
76+
- name: block_number
77+
description: "The unique block number in the sync window"
78+
- name: block_date
79+
description: "The date used for datashare sync windows"
80+
- name: total_tx_per_block
81+
description: "The total number of transactions per block"
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
{% set time_start = "current_date - interval '1' day" if is_incremental() else "current_date - interval '2' day" %}
2+
{% set time_end = "current_date + interval '1' day" %}
3+
4+
{{ config(
5+
alias = 'dbt_template_datashare_incremental_model'
6+
, materialized = 'incremental'
7+
, incremental_strategy = 'merge'
8+
, unique_key = ['block_number', 'block_date']
9+
, incremental_predicates = ["DBT_INTERNAL_DEST.block_date >= current_date - interval '1' day"]
10+
, meta = {
11+
"dune": {
12+
"public": false
13+
},
14+
"datashare": {
15+
"enabled": true,
16+
"time_column": "block_date",
17+
"time_start": time_start,
18+
"time_end": time_end
19+
}
20+
}
21+
, properties = {
22+
"partitioned_by": "ARRAY['block_date']"
23+
}
24+
) }}
25+
26+
select
27+
block_number
28+
, block_date
29+
, count(*) as total_tx_per_block
30+
from {{ source('ethereum', 'transactions') }}
31+
where block_date >= {{ time_start }}
32+
and block_date < {{ time_end }}
33+
group by 1, 2

0 commit comments

Comments
 (0)