Compatibility test suite includes variety of database tests.
- SLT tests
tests/slt/sql
folder for future sql tests, ported from duckdb/test/sql- snowplow_web based slt tests in
tests/slt/snowplow_web
folder, see dbt-snowplow/README.md
- snowplow_web dbt workload, below in this README
SLT test is a .slt
file. We use python runner to execute tests against snowflake/icehut.
Run all the tests having .slt extension recursively located in folder tests
against icehut or snowflake.
python -m icehut_sqllogictest --config-file ~/.icehut/snowflake.yaml --use-database tests2 --test-dir tests
Make sure the data is loaded in specified databases, schema and table (see dbt_project.yml
vars). [Tutorial]
(https://docs.snowplow.io/accelerators/web/upload/upload_1/) to load the data.
This tutorial guides you through creation of the init table in default Snowflake table.
Optionally load sample dataset into EVENTS_WEB table running sql script with Snowflake CLI:
# Set your dbname & schemaname just here:
snow sql -c embucket --database tests -D DBNAME=tests --schema snowplow_web -D SCHEMANAME=snowplow_web -f sqls/snowplow_web/create_load_table_events_web.sql
Create an iceberg table from this we can execute the following code:
CREATE OR REPLACE ICEBERG TABLE EVENTS_ICEBERG
EXTERNAL_VOLUME = 'ICEBERG_EXTERNAL_VOLUME'
CATALOG = 'SNOWFLAKE'
BASE_LOCATION = '<DB NAME>/<SCHEMA name>/events_iceberg/'
AS (
SELECT
APP_ID,
PLATFORM,
CAST(DATE_TRUNC('MICROSECOND', ETL_TSTAMP) AS TIMESTAMP_NTZ(6)) as ETL_TSTAMP,
CAST(DATE_TRUNC('MICROSECOND', COLLECTOR_TSTAMP) AS TIMESTAMP_NTZ(6)) as COLLECTOR_TSTAMP,
CAST(DATE_TRUNC('MICROSECOND', DVCE_CREATED_TSTAMP) AS TIMESTAMP_NTZ(6)) as DVCE_CREATED_TSTAMP,
EVENT,
EVENT_ID,
TXN_ID,
NAME_TRACKER,
V_TRACKER,
V_COLLECTOR,
V_ETL,
USER_ID,
USER_IPADDRESS,
USER_FINGERPRINT,
DOMAIN_USERID,
DOMAIN_SESSIONIDX,
NETWORK_USERID,
GEO_COUNTRY,
GEO_REGION,
GEO_CITY,
GEO_ZIPCODE,
GEO_LATITUDE,
GEO_LONGITUDE,
GEO_REGION_NAME,
IP_ISP,
IP_ORGANIZATION,
IP_DOMAIN,
IP_NETSPEED,
PAGE_URL,
PAGE_TITLE,
PAGE_REFERRER,
PAGE_URLSCHEME,
PAGE_URLHOST,
PAGE_URLPORT,
PAGE_URLPATH,
PAGE_URLQUERY,
PAGE_URLFRAGMENT,
REFR_URLSCHEME,
REFR_URLHOST,
REFR_URLPORT,
REFR_URLPATH,
REFR_URLQUERY,
REFR_URLFRAGMENT,
REFR_MEDIUM,
REFR_SOURCE,
REFR_TERM,
MKT_MEDIUM,
MKT_SOURCE,
MKT_TERM,
MKT_CONTENT,
MKT_CAMPAIGN,
SE_CATEGORY,
SE_ACTION,
SE_LABEL,
SE_PROPERTY,
SE_VALUE,
TR_ORDERID,
TR_AFFILIATION,
TR_TOTAL,
TR_TAX,
TR_SHIPPING,
TR_CITY,
TR_STATE,
TR_COUNTRY,
TI_ORDERID,
TI_SKU,
TI_NAME,
TI_CATEGORY,
TI_PRICE,
TI_QUANTITY,
PP_XOFFSET_MIN,
PP_XOFFSET_MAX,
PP_YOFFSET_MIN,
PP_YOFFSET_MAX,
USERAGENT,
BR_NAME,
BR_FAMILY,
BR_VERSION,
BR_TYPE,
BR_RENDERENGINE,
BR_LANG,
BR_FEATURES_PDF,
BR_FEATURES_FLASH,
BR_FEATURES_JAVA,
BR_FEATURES_DIRECTOR,
BR_FEATURES_QUICKTIME,
BR_FEATURES_REALPLAYER,
BR_FEATURES_WINDOWSMEDIA,
BR_FEATURES_GEARS,
BR_FEATURES_SILVERLIGHT,
BR_COOKIES,
BR_COLORDEPTH,
BR_VIEWWIDTH,
BR_VIEWHEIGHT,
OS_NAME,
OS_FAMILY,
OS_MANUFACTURER,
OS_TIMEZONE,
DVCE_TYPE,
DVCE_ISMOBILE,
DVCE_SCREENWIDTH,
DVCE_SCREENHEIGHT,
DOC_CHARSET,
DOC_WIDTH,
DOC_HEIGHT,
TR_CURRENCY,
TR_TOTAL_BASE,
TR_TAX_BASE,
TR_SHIPPING_BASE,
TI_CURRENCY,
TI_PRICE_BASE,
BASE_CURRENCY,
GEO_TIMEZONE,
MKT_CLICKID,
MKT_NETWORK,
ETL_TAGS,
CAST(DATE_TRUNC('MICROSECOND', DVCE_SENT_TSTAMP) AS TIMESTAMP_NTZ(6)) as DVCE_SENT_TSTAMP,
REFR_DOMAIN_USERID,
CAST(DATE_TRUNC('MICROSECOND', REFR_DVCE_TSTAMP) AS TIMESTAMP_NTZ(6)) as REFR_DVCE_TSTAMP,
DOMAIN_SESSIONID,
CAST(DATE_TRUNC('MICROSECOND', DERIVED_TSTAMP) AS TIMESTAMP_NTZ(6)) as DERIVED_TSTAMP,
EVENT_VENDOR,
EVENT_NAME,
EVENT_FORMAT,
EVENT_VERSION,
EVENT_FINGERPRINT,
CAST(DATE_TRUNC('MICROSECOND', TRUE_TSTAMP) AS TIMESTAMP_NTZ(6)) as TRUE_TSTAMP,
CAST(DATE_TRUNC('MICROSECOND', LOAD_TSTAMP) AS TIMESTAMP_NTZ(6)) as LOAD_TSTAMP,
CONTEXTS_COM_SNOWPLOWANALYTICS_SNOWPLOW_UA_PARSER_CONTEXT_1::STRING as CONTEXTS_COM_SNOWPLOWANALYTICS_SNOWPLOW_UA_PARSER_CONTEXT_1,
CONTEXTS_COM_SNOWPLOWANALYTICS_SNOWPLOW_WEB_PAGE_1::STRING as CONTEXTS_COM_SNOWPLOWANALYTICS_SNOWPLOW_WEB_PAGE_1,
CONTEXTS_COM_IAB_SNOWPLOW_SPIDERS_AND_ROBOTS_1::STRING as CONTEXTS_COM_IAB_SNOWPLOW_SPIDERS_AND_ROBOTS_1,
CONTEXTS_NL_BASJES_YAUAA_CONTEXT_1::STRING as CONTEXTS_NL_BASJES_YAUAA_CONTEXT_1
FROM EVENTS_WEB );
This code assumes the initial table with snowplow events is named EVENTS_WEB
.
Main differences from the default Snowflake table:
- Timestamps are truncated to TIMESTAMP_NTZ(6).
- Instead of VARIANT columns we use STRING columns.
Because we use STRING instead of VARIANT, the workload will fail. To make it work you need to do one change - In the file dbt-snowplow/dbt_packages/snowplow_web/models/base/scratch/snowflake/snowplow_web_base_events_this_run.sql you need to change line 33 from
a.contexts_com_snowplowanalytics_snowplow_web_page_1[0]:id::varchar
to
parse_json(a.contexts_com_snowplowanalytics_snowplow_web_page_1)[0]:id::varchar
May patch model with a sed command:
sed -i 's@a\.contexts_com_snowplowanalytics_snowplow_web_page_1\[0\]:id::varchar@parse_json\(a\.contexts_com_snowplowanalytics_snowplow_web_page_1)\[0\]:id::varchar@g' ./dbt-snowplow/dbt_packages/snowplow_web/models/base/scratch/snowflake/snowplow_web_base_events_this_run.sql
Set profile specific env variables:
export SNOWFLAKE_USER=user
export SNOWFLAKE_SECRET=""
export SNOWFLAKE_DB=dbname
Steps to run:
cd dbt-snowplow/
virtualenv .venv
. .venv/bin/activate
pip install dbt-core dbt-snowflake
dbt deps
dbt seed
dbt run