Skip to content

Tests compatibility with Snowflake using SLT tests and the official Snowflake SQL reference

Notifications You must be signed in to change notification settings

Embucket/compatibility-test-suite

Repository files navigation

Compatibility test suite

Compatibility test suite includes variety of database tests.

  • SLT tests
  • snowplow_web dbt workload, below in this README

SLT tests

SLT test is a .slt file. We use python runner to execute tests against snowflake/icehut. Run all the tests having .slt extension recursively located in folder tests against icehut or snowflake.

python -m icehut_sqllogictest   --config-file ~/.icehut/snowflake.yaml --use-database tests2 --test-dir tests

dbt workload

Load the sample dataset

Make sure the data is loaded in specified databases, schema and table (see dbt_project.yml vars). [Tutorial] (https://docs.snowplow.io/accelerators/web/upload/upload_1/) to load the data. This tutorial guides you through creation of the init table in default Snowflake table.
Optionally load sample dataset into EVENTS_WEB table running sql script with Snowflake CLI:

# Set your dbname & schemaname just here:
snow sql -c embucket --database tests -D DBNAME=tests --schema snowplow_web -D SCHEMANAME=snowplow_web -f sqls/snowplow_web/create_load_table_events_web.sql

Iceberg tables in workload

Create an iceberg table from this we can execute the following code:

CREATE OR REPLACE ICEBERG TABLE EVENTS_ICEBERG
EXTERNAL_VOLUME = 'ICEBERG_EXTERNAL_VOLUME'
 CATALOG = 'SNOWFLAKE'
 BASE_LOCATION = '<DB NAME>/<SCHEMA name>/events_iceberg/'
 AS (
SELECT
    APP_ID,
    PLATFORM,
    CAST(DATE_TRUNC('MICROSECOND', ETL_TSTAMP) AS TIMESTAMP_NTZ(6)) as ETL_TSTAMP,
    CAST(DATE_TRUNC('MICROSECOND', COLLECTOR_TSTAMP) AS TIMESTAMP_NTZ(6)) as COLLECTOR_TSTAMP,
    CAST(DATE_TRUNC('MICROSECOND', DVCE_CREATED_TSTAMP) AS TIMESTAMP_NTZ(6)) as DVCE_CREATED_TSTAMP,
    EVENT,
    EVENT_ID,
    TXN_ID,
    NAME_TRACKER,
    V_TRACKER,
    V_COLLECTOR,
    V_ETL,
    USER_ID,
    USER_IPADDRESS,
    USER_FINGERPRINT,
    DOMAIN_USERID,
    DOMAIN_SESSIONIDX,
    NETWORK_USERID,
    GEO_COUNTRY,
    GEO_REGION,
    GEO_CITY,
    GEO_ZIPCODE,
    GEO_LATITUDE,
    GEO_LONGITUDE,
    GEO_REGION_NAME,
    IP_ISP,
    IP_ORGANIZATION,
    IP_DOMAIN,
    IP_NETSPEED,
    PAGE_URL,
    PAGE_TITLE,
    PAGE_REFERRER,
    PAGE_URLSCHEME,
    PAGE_URLHOST,
    PAGE_URLPORT,
    PAGE_URLPATH,
    PAGE_URLQUERY,
    PAGE_URLFRAGMENT,
    REFR_URLSCHEME,
    REFR_URLHOST,
    REFR_URLPORT,
    REFR_URLPATH,
    REFR_URLQUERY,
    REFR_URLFRAGMENT,
    REFR_MEDIUM,
    REFR_SOURCE,
    REFR_TERM,
    MKT_MEDIUM,
    MKT_SOURCE,
    MKT_TERM,
    MKT_CONTENT,
    MKT_CAMPAIGN,
    SE_CATEGORY,
    SE_ACTION,
    SE_LABEL,
    SE_PROPERTY,
    SE_VALUE,
    TR_ORDERID,
    TR_AFFILIATION,
    TR_TOTAL,
    TR_TAX,
    TR_SHIPPING,
    TR_CITY,
    TR_STATE,
    TR_COUNTRY,
    TI_ORDERID,
    TI_SKU,
    TI_NAME,
    TI_CATEGORY,
    TI_PRICE,
    TI_QUANTITY,
    PP_XOFFSET_MIN,
    PP_XOFFSET_MAX,
    PP_YOFFSET_MIN,
    PP_YOFFSET_MAX,
    USERAGENT,
    BR_NAME,
    BR_FAMILY,
    BR_VERSION,
    BR_TYPE,
    BR_RENDERENGINE,
    BR_LANG,
    BR_FEATURES_PDF,
    BR_FEATURES_FLASH,
    BR_FEATURES_JAVA,
    BR_FEATURES_DIRECTOR,
    BR_FEATURES_QUICKTIME,
    BR_FEATURES_REALPLAYER,
    BR_FEATURES_WINDOWSMEDIA,
    BR_FEATURES_GEARS,
    BR_FEATURES_SILVERLIGHT,
    BR_COOKIES,
    BR_COLORDEPTH,
    BR_VIEWWIDTH,
    BR_VIEWHEIGHT,
    OS_NAME,
    OS_FAMILY,
    OS_MANUFACTURER,
    OS_TIMEZONE,
    DVCE_TYPE,
    DVCE_ISMOBILE,
    DVCE_SCREENWIDTH,
    DVCE_SCREENHEIGHT,
    DOC_CHARSET,
    DOC_WIDTH,
    DOC_HEIGHT,
    TR_CURRENCY,
    TR_TOTAL_BASE,
    TR_TAX_BASE,
    TR_SHIPPING_BASE,
    TI_CURRENCY,
    TI_PRICE_BASE,
    BASE_CURRENCY,
    GEO_TIMEZONE,
    MKT_CLICKID,
    MKT_NETWORK,
    ETL_TAGS,
    CAST(DATE_TRUNC('MICROSECOND', DVCE_SENT_TSTAMP) AS TIMESTAMP_NTZ(6)) as DVCE_SENT_TSTAMP,
    REFR_DOMAIN_USERID,
    CAST(DATE_TRUNC('MICROSECOND', REFR_DVCE_TSTAMP) AS TIMESTAMP_NTZ(6)) as REFR_DVCE_TSTAMP,
    DOMAIN_SESSIONID,
    CAST(DATE_TRUNC('MICROSECOND', DERIVED_TSTAMP) AS TIMESTAMP_NTZ(6)) as DERIVED_TSTAMP,
    EVENT_VENDOR,
    EVENT_NAME,
    EVENT_FORMAT,
    EVENT_VERSION,
    EVENT_FINGERPRINT,
    CAST(DATE_TRUNC('MICROSECOND', TRUE_TSTAMP) AS TIMESTAMP_NTZ(6)) as TRUE_TSTAMP,
    CAST(DATE_TRUNC('MICROSECOND', LOAD_TSTAMP) AS TIMESTAMP_NTZ(6)) as LOAD_TSTAMP,
    CONTEXTS_COM_SNOWPLOWANALYTICS_SNOWPLOW_UA_PARSER_CONTEXT_1::STRING as CONTEXTS_COM_SNOWPLOWANALYTICS_SNOWPLOW_UA_PARSER_CONTEXT_1,
    CONTEXTS_COM_SNOWPLOWANALYTICS_SNOWPLOW_WEB_PAGE_1::STRING as CONTEXTS_COM_SNOWPLOWANALYTICS_SNOWPLOW_WEB_PAGE_1,
    CONTEXTS_COM_IAB_SNOWPLOW_SPIDERS_AND_ROBOTS_1::STRING as CONTEXTS_COM_IAB_SNOWPLOW_SPIDERS_AND_ROBOTS_1,
    CONTEXTS_NL_BASJES_YAUAA_CONTEXT_1::STRING as CONTEXTS_NL_BASJES_YAUAA_CONTEXT_1
FROM EVENTS_WEB );

This code assumes the initial table with snowplow events is named EVENTS_WEB. Main differences from the default Snowflake table:

  1. Timestamps are truncated to TIMESTAMP_NTZ(6).
  2. Instead of VARIANT columns we use STRING columns.

Modify dbt model to support our Iceberg table

Because we use STRING instead of VARIANT, the workload will fail. To make it work you need to do one change - In the file dbt-snowplow/dbt_packages/snowplow_web/models/base/scratch/snowflake/snowplow_web_base_events_this_run.sql you need to change line 33 from

a.contexts_com_snowplowanalytics_snowplow_web_page_1[0]:id::varchar

to

parse_json(a.contexts_com_snowplowanalytics_snowplow_web_page_1)[0]:id::varchar

May patch model with a sed command:

sed -i 's@a\.contexts_com_snowplowanalytics_snowplow_web_page_1\[0\]:id::varchar@parse_json\(a\.contexts_com_snowplowanalytics_snowplow_web_page_1)\[0\]:id::varchar@g' ./dbt-snowplow/dbt_packages/snowplow_web/models/base/scratch/snowflake/snowplow_web_base_events_this_run.sql

Dbt Snowplow workload

Snowflake credentials

Set profile specific env variables:

export SNOWFLAKE_USER=user
export SNOWFLAKE_SECRET="" 
export SNOWFLAKE_DB=dbname

Workload simulates running Snowplow web dbt package (legacy one).

Steps to run:

cd dbt-snowplow/


virtualenv .venv
. .venv/bin/activate
pip install dbt-core dbt-snowflake

dbt deps
dbt seed
dbt run

About

Tests compatibility with Snowflake using SLT tests and the official Snowflake SQL reference

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages