dbt/CHANGELOG.md at dev/0.15.1 · f10et/dbt

dbt 0.15.0 (November 25, 2019)

Breaking changes

Support for Python 2.x has been dropped as it will no longer be supported on January 1, 2020
Compilation errors in .yml files are now treated as errors instead of warnings (#1493, #1751)
The 'table_name' field field has been removed from Relations
The existing compile and execute rpc tasks have been renamed to compile_sql and execute_sql (#1779, #1798) (docs)
Custom materializations must now manage dbt's Relation cache (docs)

Installation notes:

dbt v0.15.0 uses the psycopg2-binary dependency (instead of psycopg2) to simplify installation on platforms that do not have a compiler toolchain installed. If you experience segmentation faults, crashes, or installation errors, you can set the DBT_PSYCOPG2_NAME environment variable to psycopg2 to change the dependency that dbt installs. This may require a compiler toolchain and development libraries.

$ DBT_PSYCOPG2_NAME=psycopg2 pip install dbt

You may also install specific dbt plugins directly by name. This has the advantage of only installing the Python requirements needed for your particular database:

$ pip install dbt-postgres
$ pip install dbt-redshift
$ pip install dbt-snowflake
$ pip install dbt-bigquery

Core

Features

Add a JSON logger (#1237, #1791) (docs)
Add structured logging to dbt (#1704, #1799, #1715, #1806)
Add partial parsing option to the profiles.yml file (#1835, #1836, #1487) (docs)
Support configurable query comments in SQL queries (#1643, #1864) (docs)
Support atomic full-refreshes for incremental models (#525, #1682)
Support snapshot configs in dbt_project.yml (#1613, #1759) (docs)
Support cache modifications in materializations (#1683, #1770) (docs)
Support quote parameter to Accepted Values schema tests (#1873, #1876) (docs)
Support Python 3.8 (#1886)
Support filters in sources for dbt source snapshot-freshness invocation (#1495, #1776) (docs)
Support external table configuration in yml source specifications (#1784)
Improve CLI output when running snapshots (#1768, #1769)

Fixes

Fix for unhelpful error message for malformed source/ref inputs (#1660, #1809)
Fix for lingering backup tables when incremental models are full-refreshed (#1933, #1931)
Fix for confusing error message when errors are encountered during compilation (#1807, #1839)
Fix for logic error affecting the two-argument flavor of the ref function (#1504, #1515)
Fix for invalid reference to dbt.exceptions (#1569, #1609)
Fix for "cannot run empty query" error when pre/post-hooks are empty (#1108, #1719)
Fix for confusing error when project names shadow context attributes (#1696, #1748)
Fix for incorrect database logic in docs generation which resulted in columns being "merged" together across tables (#1708, #1774)
Fix for seed errors located in dependency packages (#1723, #1723)
Fix for confusing error when schema tests return unexpected results (#1808, #1903)
Fix for twice-compiled statement block contents (#1717, #1719)
Fix for inaccurate output in dbt run-operation --help (#1767, #1777)
Fix for file rotation issues concerning the logs/dbt.log file (#1863, #1865, #1871)
Fix for missing quotes in incremental model build queries (#1847, #1888)
Fix for incorrect log level in printer.print_run_result_error (#1818, #1823)

Docs

Show seeds and snapshots in the Project and Database views (docs#37, docs#25, docs#52)
Show sources in the Database tree view (docs#20, docs#52)
Show edges in the DAG between models and seeds (docs#15, docs#52)
Show Accepted Values tests and custom schema tests in the column list for models (docs#52)
Fix links for "Refocus on node" and "View documentation" in DAG context menu for seeds (docs#52)

Server

Support docs generation (#1781, #1801)
Support custom tags (#1822, #1828)
Support invoking deps on the rpc server (#1834, #1837)
Support invoking run-operation and snapshot on the rpc server (#1875, #1878)
Suppport --threads argument to cli_args method (#1897, #1909)
Support reloading the manifest when a SIGHUP signal is received (#1684, #1699)
Support invoking compile, run, test, and seed on the rpc server (#1488, #1652)
Support returning compilation logs from the last compile in the status method (#1703, #1775)
Support asyncronous compile_sql and run_sql methods (#1706, #1735)
Improve re-compilation performance (#1824, #1830)

Postgres / Redshift

Support running dbt against schemas which contain materialized views on Postgres (#1698, #1833)
Support distyle AUTO in Redshift model configs (#1882, #1885) (docs)
Fix for internal errors when run against mixed-case logical databases (#1800, #1936)

Snowflake

Support copy grants option in Snowflake model configs (#1744, #1747) (docs)
Support warehouse configuration in Snowflake model configs (#1358, #1899, #1788, #1901) (docs)
Support secure views in Snowflake model configs (#1730, #1743) (docs)
Fix for unclosed connections preventing dbt from exiting when Snowflake is used with client_session_keep_alive (#1271, #1749)
Fix for errors on Snowflake when dbt schemas contain LOCAL TEMPORARY tables (#1869, #1872)

BigQuery

Support KMS Encryption in BigQuery model configs (#1829, #1851) (docs)
Improve docs generation speed by leveraging the information schema (#1576, #1795)
Fix for cache errors on BigQuery when dataset names are capitalized (#1810, #1881)
Fix for invalid query generation when multiple options are provided to a create table|view query (#1786, #1787)
Use client.delete_dataset to drop BigQuery datasets atomically (#1887, #1881)

Under the Hood

Dependencies

Drop support for networkx 1.x (#1577, #1814)
Upgrade werkzeug to 0.15.6 (#1697, #1814)
Pin psycopg2 dependency to 2.8.x to prevent segfaults (#1221, #1898)
Set a strict upper bound for jsonschema dependency (#1817, #1821, #1932)

Everything else

Provide test names and kwargs in the manifest (#1154, #1816)
Replace JSON Schemas with data classes (#1447, #1589)
Include test name and kwargs in test nodes in the manifest (#1154, #1816)
Remove logic around handling archive blocks in the dbt_project.yml file (#1580, #1581)
Remove the APIObject class (#1762, #1780)

Contributors

Thanks all for your contributions to dbt! 🎉

dbt 0.14.4 (November 8, 2019)

This release changes the version ranges of some of dbt's dependencies. These changes address installation issues in 0.14.3 when dbt is installed from pip. You can view the full list of dependency version changes in this commit.

Note: If you are installing dbt into an environment alongside other Python libraries, you can install individual dbt plugins with:

pip install dbt-postgres
pip install dbt-redshift
pip install dbt-snowflake
pip install dbt-bigquery

Installing specific plugins may help mitigate issues regarding incompatible versions of dependencies between dbt and other libraries.

Fixes:

Fix dependency issues caused by a bad release of snowflake-connector-python (#1892, #1895)

dbt 0.14.3 (October 10, 2019)

This is a bugfix release.

Fixes:

Fix for dictionary changed size during iteration race condition (#1740, #1750)
Fix upper bound on jsonschema dependency to 3.1.1 (#1817, #1819)

Under the hood:

Provide a programmatic method for validating profile targets (#1754, #1775)

dbt 0.14.2 (September 13, 2019)

Overview

This is a bugfix release.

Fixes:

Fix for dbt hanging at the end of execution in dbt source snapshot-freshness tasks (#1728, #1729)
Fix for broken "packages" and "tags" selector dropdowns in the dbt Documentation website (docs#47, #1726)

dbt 0.14.1 (September 3, 2019)

Overview

This is primarily a bugfix release which contains a few minor improvements too. Note: this release includes an important change in how the check snapshot strategy works. See #1614 for more information. If you are using snapshots with the check strategy on dbt v0.14.0, it is strongly recommended that you upgrade to 0.14.1 at your soonest convenience.

Breaking changes

The undocumented macros attribute was removed from the graph context variable (#1615)

Features:

Summarize warnings at the end of dbt runs (#1597, #1654)
Speed up catalog generation on postgres by using avoiding use of the information_schema (#1540)
Docs site updates (#1621)
- Fix for incorrect node selection logic in DAG view (docs#38)
- Update page title, meta tags, and favicon (docs#39)
- Bump the version of dbt-styleguide, changing file tree colors from orange to black :)
Add environment variables for macro debugging flags (#1628, #1629)
Speed up node selection by making it linear, rather than quadratic, in complexity (#1611, #1615)
Specify the application field in Snowflake connections (#1622, #1623)
Add support for clustering on Snowflake (#634, #1591, #1689) (docs)
Add support for job priority on BigQuery (#1456, #1673) (docs)
Add node.config and node.tags to the generate_schema_name and generate_alias_name macro context (#1700, #1701)

Fixes:

Fix for reused check_cols values in snapshots (#1614, #1709)
Fix for rendering column descriptions in sources (#1619, #1633)
Fix for is_incremental() returning True for models that are not materialized as incremental models (#1249, #1608)
Fix for serialization of BigQuery results which contain nested or repeated records (#1626, #1638)
Fix for loading seed files which contain non-ascii characters (#1632, #1644)
Fix for creation of user cookies in incorrect directories when --profile-dir or $DBT_PROFILES_DIR is provided (#1645, #1656)
Fix for error handling when transactions are being rolled back (#1647)
Fix for incorrect references to dbt.exceptions in jinja code (#1569, #1609)
Fix for duplicated schema creation due to case-sensitive comparison (#1651, #1663)
Fix for "schema stub" created automatically by dbt (#913, #1663)
Fix for incremental merge query on old versions of postgres (<=9.6) (#1665, #1666)
Fix for serializing results of queries which return TIMESTAMP_TZ columns on Snowflake in the RPC server (#1670)
Fix typo in InternalException (#1640, #1672)
Fix typo in CLI help for snapshot migration subcommand (#1664)
Fix for error handling logic when empty queries are submitted on Snowflake (#1693, #1694)
Fix for non-atomic column expansion logic in Snowflake incremental models and snapshots (#1687, #1690)
Fix for unprojected count(*) expression injected by custom data tests (#1688)
Fix for dbt run and dbt docs generate commands when running against Panoply Redshift (#1479, #1686)

Contributors:

Thanks for your contributions to dbt!

dbt 0.14.0 - Wilt Chamberlain (July 10, 2019)

Overview

Replace Archives with Snapshots (docs, migration guide)
Add three new top-level commands:
- dbt ls (docs)
- dbt run-operation (docs)
- dbt rpc (docs)
Support the specification of severity levels for schema and data tests (docs)
Many new quality of life improvements and bugfixes

Breaking changes

Stub out adapter methods at parse-time to speed up parsing (#1413)
Removed support for the --non-destructive flag (#1419, #1415)
Removed support for the sql_where config to incremental models (#1408, #1351)
Changed expand_target_column_types to take a Relation instead of a string (#1478)
Replaced Archives with Snapshots
- Normalized meta-column names in Snapshot tables (#1361, #251)

Features

Add run-operation command which invokes macros directly from the CLI (#1328) (docs)
Add a dbt ls command which lists resources in your project (#1436, #467) (docs)
Add Snapshots, an improvement over Archives (#1361, #1175) (docs)
- Add the 'check' snapshot strategy (#1361, #706)
- Support Snapshots across logical databases (#1455)
- Implement Snapshots using a merge statement where supported (#1478)
- Support Snapshot selection using --select (#1520, #1512)
Add an RPC server via dbt rpc (#1301, #1274) (docs)
- Add ps and kill commands to the rpc server (#1380, #1369, #1370)
- Add support for ephemeral nodes to the rpc server (#1373, #1368)
- Add support for inline macros to the rpc server (#1375, #1372, #1348)
- Improve error handling in the rpc server (#1341, #1309, #1310)
Made printer width configurable (#1026, #1247) (docs)
Retry package downloads from the hub.getdbt.com (#1451, #1491)
Add a test "severity" level, presented as a keyword argument to schema tests (#1410, #1005) (docs)
Add a generate_alias_name macro to configure alias names dynamically (#1363) (docs)
Add a node argument to generate_schema_name to configure schema names dynamically (#1483, #1463) (docs)
Use create or replace on Snowflake to rebuild tables and views atomically (#1101, #1409)
Use merge statement for incremental models on Snowflake (#1414, #1307, #1409) (docs)
Add support seed CSV files that start with a UTF-8 Byte Order Mark (BOM) (#1452, #1177)
Add a warning when git packages are not pinned to a version (#1453, #1446)
Add logging for on-run-start and on-run-end hooks to console output (#1440, #696)
Add modules and tracking information to the rendering context for configuration files (#1441, #1320)
Add support for null vars, and distinguish null vars from unset vars (#1426, #608)
Add support for the search_path configuration in Postgres/Redshift profiles (#1477, #1476) (docs (postgres), docs (redshift))
Add support for persisting documentation as descriptions for tables and views on BigQuery (#1031, #1285) (docs)
Add a --project-dir path which will invoke dbt in the specified directory (#1549, #1544)

dbt docs Changes

Add searching by tag name (#32)
Add context menu link to export graph viz as a PNG (#34)
Fix for clicking models in left-nav while search results are open (#31)

Fixes

Fix for unduly long timeouts when anonymous event tracking is blocked (#1445, #1063)
Fix for error with mostly-duplicate git urls in packages, picking the one that came first. (#1428, #1084)
Fix for unrendered description field as jinja in top-level Source specification (#1484, #1494)
Fix for API error when very large temp tables are created in BigQuery (#1423, #1478)
Fix for compiler errors that occurred if jinja code was present outside of a docs blocks in .md files (#1513, #988)
Fix TEXT handling on postgres and redshift (#1420, #781)
Fix for compiler error when vars are undefined but only used in disabled models (#1429, #434)
Improved the error message when iterating over the results of a macro that doesn't exist (#1425, #1424)
Improved the error message when tests have invalid parameter definitions (#1427, #1325)
Improved the error message when a user tries to archive a non-existent table (#1361, #1066)
Fix for archive logic which tried to create already-existing destination schemas (#1398, #758)
Fix for incorrect error codes when Operations exit with an error (#1406, #1377)
Fix for missing compiled SQL when the rpc server encounters a database error (#1381, #1371)
Fix for broken link in the profile.yml generated by dbt init (#1366, #1344)
Fix the sample test.env file's redshift password field (#1364)
Fix collisions on models running concurrently that have duplicate names but have distinguishing aliases (#1342, #1321)
Fix for a bad error message when a version is missing from a package spec in packages.yml (#1551, #1546)
Fix for wrong package scope when the two-arg method of ref is used (#1515, #1504)
Fix missing import in test suite (#1572)
Fix for a Snowflake error when an external table exists in a schema that dbt operates on (#1571, #1505)

Under the hood

Use pytest for tests (#1417)
Use flake8 for linting (#1361, #1333)
Added a flag for wrapping models and tests in jinja blocks (#1407, #1400)
Connection management: Bind connections threads rather than to names (#1336, #1312)
Add deprecation warning for dbt users on Python2 (#1534, #1531)
Upgrade networkx to v2.x (#1509, #1496)
Anonymously track adapter type and rpc requests when tracking is enabled (#1575, #1574)
Fix for test warnings and general test suite cleanup (#1578)

Contributors:

Over a dozen contributors wrote code for this release of dbt! Thanks for taking the time, and nice work y'all! :)

dbt 0.13.1 (May 13, 2019)

Overview

This is a bugfix release.

Bugfixes

Add "MaterializedView" relation type to the Snowflake adapter (#1430, #1432) (@adriank-convoy)
Quote databases properly (#1396, #1402)
Use "ilike" instead of "=" for database equality when listing schemas (#1411, #1412)
Pass the model name along in get_relations (#1384, #1388)
Add logging to dbt clean (#1261, #1383, #1391) (@emilieschario)

dbt Docs

Search by columns (dbt-docs#23) (rmgpinto)
Support @ selector (dbt-docs#27)
Fix number formatting on Snowflake and BQ in table stats (dbt-docs#28)

Contributors:

Thanks for your contributions to dbt!

dbt 0.13.0 - Stephen Girard (March 21, 2019)

Overview

This release provides a stable API for building new adapters and reimplements dbt's adapters as "plugins". Additionally, a new adapter for Presto was added using this architecture. Beyond adapters, this release of dbt also includes Sources which can be used to document and test source data tables. See the full list of features added in 0.13.0 below.

Breaking Changes

version 1 schema.yml specs are no longer implemented. Please use the version 2 spec instead (migration guide)
{{this}} is no longer implemented for on-run-start and on-run-end hooks. Use {{ target }} or an on-run-end context variable instead (#1176, implementing #878)
A number of materialization-specific adapter methods have changed in breaking ways. If you use these adapter methods in your macros or materializations, you may need to update your code accordingly.
- query_for_existing - removed, use get_relation instead.
- get_missing_columns - changed to take Relations instead of schemas and identifiers
- expand_target_column_types - changed to take a Relation instead of schema, identifier
- get_relation - added a database argument
- create_schema - added a database argument
- drop_schema - added a database argument

Deprecations

The following adapter methods are now deprecated, and will be removed in a future release:
- get_columns_in_table - deprecated in favor of get_columns_in_relation
- already_exists - deprecated in favor of get_relation

Features

Add sources to dbt, use them to calculate source data freshness (docs ) (#814, #1240)
Add support for Presto (docs, repo) (#1106)
Add require-dbt-version option to dbt_project.yml to state the supported versions of dbt for packages (docs) (#581)
Add an output line indicating the installed version of dbt to every run (#1134)
Add a new model selector (@) which build models, their children, and their children's parents (docs) (#1156)
Add support for Snowflake Key Pair Authentication (docs) (#1232)
Support SSO Authentication for Snowflake (docs) (#1172)
Add support for Snowflake's transient tables (docs) (#946)
Capture build timing data in run_results.json to visualize project performance (#1179)
Add CLI flag to toggle warnings as errors (docs) (#1243)
Add tab completion script for Bash (docs) (#1197)
Added docs on how to build a new adapter (docs) (#560)
Use new logo (#1349)

Fixes

Fix for Postgres character columns treated as string types (#1194)
Fix for hard to reach edge case in which dbt could hang (#1223)
Fix for dbt deps in non-English shells (#1222)
Fix for over eager schema creation when models are run with --models (#1239)
Fix for dbt seed --show (#1288)
Fix for is_incremental() which should only return True if the target relation is a table (#1292)
Fix for error in Snowflake table materializations with custom schemas (#1316)
Fix errored out concurrent transactions on Redshift and Postgres (#1356)
Fix out of order execution on model select (#1354, #1355)
Fix adapter macro namespace issue (#1352, #1353)
Re-add CLI flag to toggle warnings as errors (#1347)
Fix release candidate regression that runs run hooks on test invocations (#1346)
Fix Snowflake source quoting (#1338, #1317, #1332)
Handle unexpected max_loaded_at types (#1330)

Under the hood

Replace all SQL in Python code with Jinja in macros (#1204)
Loosen restrictions of boto3 dependency (#1234)
Rewrote Postgres introspective queries to be faster on large databases (#1192

Contributors:

Thanks for your contributions to dbt!

dbt 0.12.2 - Grace Kelly (January 8, 2019)

Overview

This release reduces the runtime of dbt projects by improving dbt's approach to model running. Additionally, a number of workflow improvements have been added.

Deprecations

Deprecate sql_where (#744) (docs)

Features

More intelligently order and execute nodes in the graph. This significantly speeds up the runtime of most dbt projects (#813)
Add -m flag as an alias for --models (#1160)
Add post_hook and pre_hook as aliases for post-hook and pre-hook, respectively (#1124) (docs)
Better handling of git errors in dbt deps + full support for Windows (#994, #778, #895)
Add support for specifying a location in BigQuery datasets (#969) (docs)
Add support for Jinja expressions using the {% do ... %} block (#1113)
The dbt debug command is actually useful now (#1061)
The config function can now be called multiple times in a model (#558)
Source the latest version of dbt from PyPi instead of GitHub (#1122)
Add a peformance profiling mechnanism to dbt (#1001)
Add caching for dbt's macros-only manifest to speedup parsing (#1098)

Fixes

Fix for custom schemas used alongside the generate_schema_name macro (#801)
Fix for silent failure of tests that reference nonexistent models (#968)
Fix for generate_schema_name macros that return whitespace-padded schema names (#1074)
Fix for incorrect relation type for backup tables on Snowflake (#1103)
Fix for incorrectly cased values in the relation cache (#1140)
Fix for JSON decoding error on Python2 installed with Anaconda (#1155)
Fix for unhandled exceptions that occur in anonymous event tracking (#1180)
Fix for analysis files that contain raw tags (#1152)
Fix for packages which reference the hubsite (#1095)

dbt 0.12.1 - (November 15, 2018)

Overview

This is a bugfix release.

Fixes

Fix for relation caching when views outside of a dbt schema depend on relations inside of a dbt schema (#1119)

dbt 0.12.0 - Guion Bluford (November 12, 2018)

Overview

This release adds caching for some introspective queries on all adapters. Additionally, custom tags can be supplied for models, along with many other minor improvements and bugfixes.

Breaking Changes

Support for the repositories: block in dbt_project.yml (deprecated in 0.10.0) was removed.

tl;dr

Make runs faster by caching introspective queries
Support model tags
Add a list of schemas to the on-run-end context
Set your profiles directory with an environment variable

Features

Cache the existence of relations to speed up dbt runs (#1025)
Add support for tag configuration and selection (#1014)
- Add tags to the model and graph views in the docs UI (#7)
Add the set of schemas that dbt built models into in the on-run-end hook context (#908)
Warn for unused resource config paths in dbt_project.yml (#725)
Add more information to the dbt --help output (#1058)
Add support for configuring the profiles directory with an env var (#1055)
Add support for cli and env vars in most dbt_project.yml and profiles.yml fields (#1033)
Provide a better error message when seed file loading fails on BigQuery (#1079)
Improved error handling and messaging on Redshift (#997)
Include datasets with underscores when listing BigQuery datasets (#954)
Forgo validating the user's profile for dbt deps and dbt clean commands (#947, #1022)
Don't read/parse CSV files outside of the dbt seed command (#1046)

Fixes

Fix for incorrect model selection with the --models CLI flag when projects and directories share the same name (#1023)
Fix for table clustering configuration with multiple columns on BigQuery (#1013)
Fix for incorrect output when a single row fails validation in dbt test (#1040)
Fix for unwieldly Jinja errors regarding undefined variables at parse time (#1086, #1080, #935)
Fix for incremental models that have a line comment on the last line of the file (#1018)
Fix for error messages when ephemeral models fail to compile (#1053)

Under the hood

Create adapters as singleton objects instead of classes (#961)
Combine project and profile into a single, coherent object (#973)
Investigate approaches for providing more complete compilation output (#588)

Contributors

Thanks for contributing!

dbt 0.11.1 - Lucretia Mott (September 18, 2018)

Overview

This is a patch release containing a few bugfixes and one quality of life change for dbt docs.

Features

dbt
- Add --port parameter to dbt docs serve (#987)

Fixes

dbt
- Fix hooks in model configs not running (#985)
- Fix integration test on redshift catalog generation (#977)
- Snowflake: Fix docs generation errors when QUOTED_IDENTIFIER_IGNORE_CASE is set (#998)
- Translate empty strings to null in seeds (#995)
- Filter out null schemas during catalog generation (#992)
- Fix quoting on drop, truncate, and rename (#991)
dbt-docs
- Fix for non-existent column in schema.yml (#3)
- Fixes for missing tests in docs UI when columns are upcased (#2)
- Fix "copy to clipboard" (#4)

dbt 0.11.0 - Isaac Asimov (September 6, 2018)

Overview

This release adds support for auto-generated dbt documentation, adds a new syntax for schema.yml files, and fixes a number of minor bugs. With the exception of planned changes to Snowflake's default quoting strategy, this release should not contain any breaking changes. Check out the blog post for more information about this release.

Breaking Changes

Change default Snowflake quoting strategy to "unquoted" (docs) (#824)

Features

Add autogenerated dbt project documentation (docs) (#375, #863, #941, #815)
Version 2 of schema.yml, which allows users to create table and column comments that end up in the manifest (docs) (#880)
Extend catalog and manifest to also support Snowflake, BigQuery, and Redshift, in addition to existing Postgres support (#866, #857, #849)
Add a 'generated_at' field to both the manifest and the catalog. (#887)
Add docs blocks that users can put into .md files and doc() value for schema v2 description fields (#888)
Write out a 'run_results.json' after dbt invocations. (#904)
Type inference for interpreting CSV data is now less aggressive (#905)
Remove distinction between this.table and this.schema by refactoring materialization SQL (#940)

Fixes

Fix for identifier clashes in BigQuery merge statements (#914)
Fix for unneccessary downloads of bumpversion.cfg, handle failures gracefully (#907)
Fix for incompatible boto3 requirements (#959)
Fix for invalid relationships test when the parent column contains null values (#921)

Contributors

Thanks for contributing!

dbt 0.10.2 - Betsy Ross (August 3, 2018)

Overview

This release makes it possible to alias relation names, rounds out support for BigQuery with incremental, archival, and hook support, adds the IAM Auth method for Redshift, and builds the foundation for autogenerated dbt project documentation, to come in the next release.

Additionally, a number of bugs have been fixed including intermittent BigQuery 404 errors, Redshift "table dropped by concurrent query" errors, and a probable fix for Redshift connection timeout issues.

Contributors

We want to extend a big thank you to our outside contributors for this release! You all are amazing.

Features

BigQuery
- Support incremental models (#856) (docs)
- Support archival (#856) (docs)
- Add pre/post hook support (#836) (docs)
Redshift: IAM Auth (#818) (docs)
Model aliases (#800)(docs)
Write JSON manifest file to disk during compilation (#761)
Add forward and backward graph edges to the JSON manifest file (#762)
Add a 'dbt docs generate' command to generate a JSON catalog file (#774, #808)

Bugfixes

BigQuery: fix concurrent relation loads (#835)
BigQuery: support external relations (#828)
Redshift: set TCP keepalive on connections (#826)
Redshift: fix "table dropped by concurrent query" (#825)
Fix the error handling for profiles.yml validation (#820)
Make the --threads parameter actually change the number of threads used (#819)
Ensure that numeric precision of a column is not None (#796)
Allow for more complex version comparison (#797)

Changes

Use a subselect instead of CTE when building incremental models (#787)
Internals
- Improved dependency selection, rip out some unused dependencies (#848)
- Stop tracking run_error in tracking code (#817)
- Use Mapping instead of dict as the base class for APIObject (#756)
- Split out parsers (#809)
- Fix __all__ parameter in submodules (#780)
- Switch to CircleCI 2.0 (#843, #850)
- Added tox environments that have the user specify what tests should be run (#837)

dbt 0.10.1 (May 18, 2018)

This release focuses on achieving functional parity between all of dbt's adapters. With this release, most dbt functionality should work on every adapter except where noted here.

tl;dr

Configure model schema and name quoting in your dbt_project.yml file (Docs)
Add a Relation object to the context to simplify model quoting Docs
Implement BigQuery materializations using new create table as (...) syntax, support partition by clause (Docs)
Override seed column types (Docs)
Add get_columns_in_table context function for BigQuery (Docs)

Changes

Consistent schema and identifier quoting (#727)
- Configure quoting settings in the dbt_project.yml file (#742)
- Add a Relation object to the context to make quoting consistent and simple (#742)
Use the new create table as (...) syntax on BigQuery (#717)
- Support partition by clause
CSV Updates:
- Use floating point as default seed column type to avoid issues with type inference (#694)
- Provide a mechanism for overriding seed column types in the dbt_project.yml file (#708)
- Fix seeding for files with more than 16k rows on Snowflake (#694)
- Implement seeds using a materialization
Improve get_columns_in_table context function (#709)
- Support numeric types on Redshift, Postgres
- Support BigQuery (including nested columns in struct types)
- Support cross-database information_schema queries for Snowflake
- Retain column ordinal positions

Bugfixes

Fix for incorrect var precendence when using --vars on the CLI (#739)
Fix for closed connections in on-run-end hooks for long-running dbt invocations (#693)
Fix: don't try to run empty hooks (#620, #693)
Fix: Prevent seed data from being serialized into graph.gpickle file (#720)
Fix: Disallow seed and model files with the same name (#737)

dbt 0.10.0 (March 8, 2018)

This release overhauls dbt's package management functionality, makes seeding csv files work across all adapters, and adds date partitioning support for BigQuery.

Upgrading Instructions:

Check out full installation and upgrading instructions here
Transition the repositories: section of your dbt_project.yml file to a packages.yml file as described here
You may need to clear out your dbt_modules directory if you use packages like dbt-utils. Depending how your project is configured, you can do this by running dbt clean.
We're using a new CSV parsing library, agate, so be sure to check that all of your seed tables are parsed as you would expect!

Changes

Support for variables defined on the CLI with --vars (#640) (docs)
Improvements to dbt seed (docs)
- Support seeding csv files on all adapters (#618)
- Make seed csv's ref()-able in models (#668)
- Support seed file configuration (custom schemas, enabled / disabled) in the dbt_project.yml file (#561)
- Support --full-refresh instead of --drop-existing (deprecated) for seed files (#515)
- Add --show argument to dbt seed to display a sample of data in the CLI (#74)
Improvements to package management (docs)
- Deprecated repositories: config option in favor of packages: (#542)
- Deprecated package listing in dbt_project.yml in favor of packages.yml (#681)
- Support stating local file paths as dependencies (#542)
Support date partitioning in BigQuery (#641) (docs)
Move schema creation to after on-run-start hooks (#652)
Replace csvkit dependency with agate (#598)
Switch snowplow endpoint to pipe directly to Fishtown Analytics (#682)

Bugfixes

Throw a compilation exception if a required test macro is not present in the context (#655)
Make the adapter_macro use the return() function (#635)
Fix bug for introspective query on late binding views (redshift) (#647)
Disable any non-dbt log output on the CLI (#663)

dbt 0.9.1 (January 2, 2018)

This release fixes bugs and adds supports for late binding views on Redshift.

Changes

Support late binding views on Redshift (#614) (docs)
Make run_started_at timezone-aware (#553) (Contributed by @mturzanska) (docs)

Bugfixes

Include hook run time in reported model run time (#607)
Add warning for missing test constraints (#600)
Fix for schema tests used or defined in packages (#599)
Run hooks in defined order (#601)
Skip tests that depend on nonexistent models (#617)
Fix for adapter_macro called within a package (#630)

dbt 0.9.0 (October 25, 2017)

This release focuses on improvements to macros, materializations, and package management. Check out the blog post to learn more about what's possible in this new version of dbt.

Installation

Full installation instructions for macOS, Windows, and Linux can be found here. If you use Windows or Linux, installation works the same as with previous versions of dbt. If you use macOS and Homebrew to install dbt, note that installation instructions have changed:

macOS Installation Instructions

brew update
brew tap fishtown-analytics/dbt
brew install dbt

Overview

More powerful macros and materializations
Custom model schemas
BigQuery improvements
Bugfixes
Documentation (0.9.0 docs can be found here)

Breaking Changes

adapter functions must be namespaced to the adapter context variable. To fix this error, use adapter.already_exists instead of just already_exists, or similar for other adapter functions.

Bugfixes

Handle lingering __dbt_tmp relations (#511)
Run tests defined in an ephemeral directory (#509)

Changes

use adapter, ref, and var inside of macros (#466)
Build custom tests and materializations in dbt packages (#466)
Support pre- and post- hooks that run outside of a transaction (#510)
Support table materializations for BigQuery (#507)
Support querying external data sources in BigQuery (#507)
Override which schema models are materialized in (#522) (docs)
Make {{ ref(...) }} return the same type of object as {{ this }} (#530)
Replace schema test CTEs with subqueries to speed them up for Postgres (#536) (@ronnyli)
Bump Snowflake dependency, remove pyasn1 (#570)

Documentation

Document how to create a package
Document how to make a materialization
Document how to make custom schema tests
Document how to use hooks to vacuum
Document all context variables

New Contributors

@ronnyli (#536)

dbt 0.9.0 Alpha 5 (October 24, 2017)

Overview

Bump Snowflake dependency, remove pyasn1 (#570)

dbt 0.9.0 Alpha 4 (October 3, 2017)

Bugfixes

Fix for federated queries on BigQuery with Service Account json credentials (#547)

dbt 0.9.0 Alpha 3 (October 3, 2017)

Overview

Bugfixes
Faster schema tests on Postgres
Fix for broken environment variables

Improvements

Replace schema test CTEs with subqueries to speed them up for Postgres (#536) (@ronnyli)

Bugfixes

Fix broken integration tests (#539)
Fix for --non-destructive on views (#539)
Fix for package models materialized in the wrong schema (#538)
Fix for broken environment variables (#543)

New Contributors

@ronnyli
- https://github.com/fishtown-analytics/dbt/pull/536

dbt 0.9.0 Alpha 2 (September 20, 2017)

Overview

Custom model schemas
BigQuery updates
ref improvements

Bugfixes

Parity for statement interface on BigQuery (#526)

Changes

Override which schema models are materialized in (#522) (docs)
Make {{ ref(...) }} return the same type of object as {{ this }} (#530)

dbt 0.9.0 Alpha 1 (August 29, 2017)

Overview

More powerful macros
BigQuery improvements
Bugfixes
Documentation (0.9.0 docs can be found here)

Breaking Changes

dbt 0.9.0 Alpha 1 introduces a number of new features intended to help dbt-ers write flexible, reusable code. The majority of these changes involve the macro and materialization Jinja blocks. As this is an alpha release, there may exist bugs or incompatibilites, particularly surrounding these two blocks. A list of known breaking changes is provided below. If you find new bugs, or have questions about dbt 0.9.0, please don't hesitate to reach out in slack or open a new issue.

1. Adapter functions must be namespaced to the `adapter` context variable

This will manifest as a compilation error that looks like:

Compilation Error in model {your_model} (models/path/to/your_model.sql)
  'already_exists' is undefined

To fix this error, use adapter.already_exists instead of just already_exists, or similar for other adapter functions.

Bugfixes

Handle lingering __dbt_tmp relations (#511)
Run tests defined in an ephemeral directory (#509)

Changes

use adapter, ref, and var inside of macros (#466)
Build custom tests and materializations in dbt packages (#466)
Support pre- and post- hooks that run outside of a transaction (#510)
Support table materializations for BigQuery (#507)
Support querying external data sources in BigQuery (#507)

Documentation

Document how to create a package
Document how to make a materialization
Document how to make custom schema tests

dbt 0.8.3 (July 14, 2017)

Overview

Add suppport for Google BigQuery
Significant exit codes
Load credentials from environment variables

Bugfixes

Fix errant warning for dbt archive commands (#476)
Show error (instead of backtrace) for failed hook statements (#478)
dbt init no longer leaves the repo in an invalid state (#487)
Fix bug which ignored git tag specs for package repos (#463)

Changes

Support BigQuery as a target (#437) (#438)
Make dbt exit codes significant (0 = success, 1/2 = error) (#297)
Add context function to pull in environment variables (#450)

Documentation

Document target configuration for BigQuery here
Document dbt exit codes here
Document environment variable usage here

dbt 0.8.2 (May 31, 2017)

Overview

UI/UX improvements (colorized output, failures summary, better error messages)
Cancel running queries on ctrl+c
Bugfixes
Docs

Bugfixes

Fix bug for interleaved sort keys on Redshift (#430)

Changes

Don't try to create schema if it already exists (#446)
Summarize failures for dbt invocations (#443)
Colorized dbt output (#441)
Cancel running queries on ctrl-c (#444)
Better error messages for common failure modes (#445)
Upgrade dependencies (#431)
Improvements to dbt init and first time dbt usage experience (#439)

Documentation

Document full-refresh requirements for incremental models (#417)
Document archival (#433)
Document the two-version variant of ref (#432)

dbt 0.8.1 (May 10, 2017)

Overview

Bugfixes
Reintroduce compile command
Moved docs to readme.io

Bugfixes

Fix bug preventing overriding a disabled package model in the current project (#391)
Fix bug which prevented multiple sort keys (provided as an array) on Redshift (#397)
Fix race condition while compiling schema tests in an empty target directory (#398)

Changes

Reintroduce dbt compile command (#407)
Compile on-run-start and on-run-end hooks to a file (#412)

Documentation

Move docs to readme.io (#414)
Add docs for event tracking opt-out (#399)

dbt 0.8.0 (April 17, 2017)

Overview

Bugfixes
True concurrency
More control over "advanced" incremental model configurations more info

Bugfixes

Fix ephemeral load order bug (#292, #285)
Support composite unique key in archivals (#324)
Fix target paths (#331, #329)
Ignore commented-out schema tests (#330, #328)
Fix run levels (#343, #340, #338)
Fix concurrency, open a unique transaction per model (#345, #336)
Handle concurrent DROP ... CASCADEs in Redshift (#349)
Always release connections (use try .. finally) (#354)

Changes

Changed: different syntax for "relationships" schema tests (#339)
Added: already_exists context function (#372)
Graph refactor: fix common issues with load order (#292)
Graph refactor: multiple references to an ephemeral models should share a CTE (#316)
Graph refactor: macros in flat graph (#332)
Refactor: factor out jinja interactions (#309)
Speedup: detect cycles at the end of compilation (#307)
Speedup: write graph file with gpickle instead of yaml (#306)
Clone dependencies with --depth 1 to make them more compact (#277, #342)
Rewrite materializations as macros (#356)

dbt 0.7.1 (February 28, 2017)

Overview

Improved graph selection
A new home for dbt
Snowflake improvements

New Features

improved graph selection for dbt run and dbt test (more information) (#279)
profiles.yml now supports Snowflake role as an option (#291)

A new home for dbt

In v0.7.1, dbt was moved from the analyst-collective org to the fishtown-analytics org (#300)

Bugfixes

nicer error if run-target was not changed to target during upgrade to dbt>=0.7.0

dbt 0.7.0 (February 9, 2017)

Overview

Snowflake Support
Deprecations

Snowflake Support

dbt now supports Snowflake as a target in addition to Postgres and Redshift! All dbt functionality is supported in this new warehouse. There is a sample snowflake profile in sample.profiles.yml -- you can start using it right away.

Deprecations

There are a few deprecations in 0.7:

run-target in profiles.yml is no longer supported. Use target instead.
Project names (name in dbt_project.yml) can now only contain letters, numbers, and underscores, and must start with a letter. Previously they could contain any character.
--dry-run is no longer supported.

Notes

New Features

dbt now supports Snowflake as a warehouse (#259)

Bugfixes

use adapter for sort/dist (#274)

Deprecations

run-target and name validations (#280)
dry-run removed (#281)

Changes

fixed a typo in the docs related to post-run hooks (#271)
refactored tracking code to refresh invocation id in a multi-run context (#273)
added unit tests for the graph (#270)

dbt 0.6.2 (January 16, 2017)

Changes

condense error output when --debug is not set (#265)

dbt 0.6.1 (January 11, 2017)

Bugfixes

respect config options in profiles.yml (#255)
use correct on-run-end option for post-run hooks (#261)

Changes

add --debug flag, replace calls to print() with a global logger (#256)
add pep8 check to continuous integration tests and bring codebase into compliance (#257)

dbt release 0.6.0

tl;dr

Macros
More control over how models are materialized
Minor improvements
Bugfixes
Connor McArthur

Macros

Macros are snippets of SQL that can be called like functions in models. Macros make it possible to re-use SQL between models in keeping with the engineering principle of DRY (Dont Repeat Yourself). Moreover, packages can expose Macros that you can use in your own dbt project.

For detailed information on how to use Macros, check out the pull request here

Runtime Materialization Configs

DBT Version 0.6.0 introduces two new ways to control the materialization of models:

Non-destructive dbt run more info

If you provide the --non-destructive argument to dbt run, dbt will minimize the amount of time during which your models are unavailable. Specfically, dbt will

Ignore models materialized as views
Truncate tables and re-insert data instead of dropping and re-creating

This flag is useful for recurring jobs which only need to update table models and incremental models.

dbt run --non-destructive

Incremental Model Full Refresh more info

If you provide the --full-refresh argument to dbt run, dbt will treat incremental models as table models. This is useful when

An incremental model schema changes and you need to recreate the table accordingly
You want to reprocess the entirety of the incremental model because of new logic in the model code

dbt run --full-refresh

Note that --full-refresh and --non-destructive can be used together!

For more information, run

dbt run --help

Minor improvements more info

Add a `{{ target }}` variable to the dbt runtime more info

Use {{ target }} to interpolate profile variables into your model definitions. For example:

-- only use the last week of data in development
select * from events

{% if target.name == 'dev' %}
where created_at > getdate() - interval '1 week'
{% endif %}

User-specified `profiles.yml` dir more info

DBT looks for a file called profiles.yml in the ~/.dbt/ directory. You can now overide this directory with

$ dbt run --profiles-dir /path/to/my/dir

Add timestamp to console output more info

Informative and pretty

Run dbt from subdirectory of project root more info

A story in three parts:

cd models/snowplow/sessions
vim sessions.sql
dbt run # it works!

Pre and post run hooks more info

# dbt_project.yml
name: ...
version: ...

...

# supply either a string, or a list of strings
on-run-start: "create table public.cool_table (id int)"
on-run-end:
  - insert into public.cool_table (id) values (1), (2), (3)
  - insert into public.cool_table (id) values (4), (5), (6)

Bug fixes

We fixed 10 bugs in this release! See the full list here

dbt release 0.5.4

tl;dr

added support for custom SQL data tests
- SQL returns 0 results --> pass
- SQL returns > 0 results --> fail
dbt-core integration tests
- running in Continuous Integration environments
  - windows (appveyor)
  - linux (circle)
- with code coverage

Custom SQL data tests

Schema tests have proven to be an essential part of a modern analytical workflow. These schema tests validate basic constraints about your data. Namely: not null, unique, accepted value, and foreign key relationship properties can be asserted using schema tests.

With dbt v0.5.4, you can now write your own custom "data tests". These data tests are SQL SELECT statements that return 0 rows on success, or > 0 rows on failure. A typical data test might look like:

-- tests/assert_less_than_5_pct_event_cookie_ids_are_null.sql

-- If >= 5% of cookie_ids are null, then the test returns 1 row (failure).
-- If < 5% of cookie_ids are null, then the test returns 0 rows (success)

with calc as (

    select
      sum(case when cookie_id is null then 1 else 0 end)::float / count(*)::float as fraction
    from {{ ref('events') }}

)

select * from calc where fraction < 0.05

To enable data tests, add the test-paths config to your dbt_project.yml file:

name: 'Vandelay Industries`
version: '1.0'

source-paths: ["models"]
target-path: "target"
test-paths: ["tests"]        # look for *.sql files in the "tests" directory
....

Any .sql file found in the test-paths director(y|ies) will be evaluated as data tests. These tests can be run with:

dbt test # run schema + data tests
dbt test --schema # run only schema tests
dbt test --data # run only data tests
dbt test --data --schema # run schema + data tests

# For more information, try
dbt test -h

DBT-core integration tests

With the dbt 0.5.4 release, dbt now features a robust integration test suite. These integration tests will help mitigate the risk of software regressions, and in so doing, will help us develop dbt more quickly. You can check out the tests here, and the test results here (linux/osx) and here (windows).

The Future

You can check out the DBT roadmap here. In the next few weeks, we'll be working on bugfixes, minor features, improved macro support, and expanded control over runtime materialization configs.

As always, feel free to reach out to us on Slack with any questions or comments!

dbt release 0.5.3

Bugfix release.

Fixes regressions introduced in 0.5.1 and 0.5.2.

Fixed 0.5.1 regressions

Incremental models were broken by the new column expansion feature. Column expansion is implemented as

alter table ... add column tmp_col varchar({new_size});
update ... set tmp_col = existing_col
alter table ... drop column existing_col
alter table ... rename tmp_col to existing_col

This has the side-effect of moving the existing_col to the "end" of the table. When an incremental model tries to

insert into {table} (
   select * from tmp_table
)

suddenly the columns in {table} are incongruent with the columns in tmp_table. This insert subsequently fails.

The fix for this issue is twofold:

If the incremental model table DOES NOT already exist, avoid inserts altogether. Instead, run a create table as (...) statement
If the incremental model table DOES already exist, query for the columns in the existing table and use those to build the insert statement, eg:

insert into "dbt_dbanin"."sessions" ("session_end_tstamp", "session_start_tstamp", ...)
(
    select "session_end_tstamp", "session_start_tstamp", ...
    from "sessions__dbt_incremental_tmp"
);

In this way, the source and destination columns are guaranteed to be in the same order!

Fixed 0.5.2 regressions

We attempted to refactor the way profiles work in dbt. Previously, a default user profile was loaded, and the profiles specified in dbt_project.yml or on the command line (with --profile) would be applied on top of the user config. This implementation is some of the earliest code that was committed to dbt.

As dbt has grown, we found this implementation to be a little unwieldy and hard to maintain. The 0.5.2 release made it so that only one profile could be loaded at a time. This profile needed to be specified in either dbt_project.yml or on the command line with --profile. A bug was errantly introduced during this change which broke the handling of dependency projects.

The future

The additions of automated testing and a more comprehensive manual testing process will go a long way to ensuring the future stability of dbt. We're going to get started on these tasks soon, and you can follow our progress here: https://github.com/fishtown-analytics/dbt/milestone/16 .

As always, feel free to reach out to us on Slack with any questions or concerns:

dbt release 0.5.2

Patch release fixing a bug that arises when profiles are overridden on the command line with the --profile flag.

See https://github.com/fishtown-analytics/dbt/releases/tag/v0.5.1

dbt release 0.5.1

0. tl;dr

Raiders of the Lost Archive -- version your raw data to make historical queries more accurate
Column type resolution for incremental models (no more Value too long for character type errors)
Postgres support
Top-level configs applied to your project + all dependencies
--threads CLI option + better multithreaded output

1. Source table archival https://github.com/fishtown-analytics/dbt/pull/183

Commonly, analysts need to "look back in time" at some previous state of data in their mutable tables. Imagine a users table which is synced to your data warehouse from a production database. This users table is a representation of what your users look like now. Consider what happens if you need to look at revenue by city for each of your users trended over time. Specifically, what happens if a user moved from, say, Philadelphia to New York? To do this correctly, you need to archive snapshots of the users table on a recurring basis. With this release, dbt now provides an easy mechanism to store such snapshots.

To use this new feature, declare the tables you want to archive in your dbt_project.yml file:

archive:
    - source_schema: synced_production_data  # schema to look for tables in (declared below)
      target_schema: dbt_archive             # where to archive the data to
      tables:                                # list of tables to archive
        - source_table: users                # table to archive
          target_table: users_archived       # table to insert archived data into
          updated_at: updated_at             # used to determine when data has changed
          unique_key: id                     # used to generate archival query

        - source_table: some_other_table
           target_table: some_other_table_archive
           updated_at: "updatedAt"
           unique_key: "expressions || work || LOWER(too)"

    - source_schema: some_other_schema
      ....

The archived tables will mirror the schema of the source tables they're generated from. In addition, three fields are added to the archive table:

valid_from: The timestamp when this archived row was inserted (and first considered valid)
valid_to: The timestamp when this archived row became invalidated. The first archived record for a given unique_key has valid_to = NULL. When newer data is archived for that unique_key, the valid_to field of the old record is set to the valid_from field of the new record!
scd_id: A unique key generated for each archive record. Scd = Slowly Changing Dimension.

dbt models can be built on top of these archived tables. The most recent record for a given unique_key is the one where valid_to is null.

To run this archive process, use the command dbt archive. After testing and confirming that the archival works, you should schedule this process through cron (or similar).

2. Incremental column expansion https://github.com/fishtown-analytics/dbt/issues/175

Incremental tables are a powerful dbt feature, but there was at least one edge case which makes working with them difficult. During the first run of an incremental model, Redshift will infer a type for every column in the table. Subsequent runs can insert new data which does not conform to the expected type. One example is a varchar(16) field which is inserted into a varchar(8) field. In practice, this error looks like:

Value too long for character type
DETAIL:
  -----------------------------------------------
  error:  Value too long for character type
  code:      8001
  context:   Value too long for type character varying(8)
  query:     3743263
  location:  funcs_string.hpp:392
  process:   query4_35 [pid=18194]
  -----------------------------------------------

With this release, dbt will detect when column types are incongruent and will attempt to reconcile these different types if possible. Specifically, dbt will alter the incremental model table schema from character varying(x) to character varying(y) for some y > x. This should drastically reduce the occurrence of this class of error.

3. First-class Postgres support https://github.com/fishtown-analytics/dbt/pull/183

With this release, Postgres became a first-class dbt target. You can configure a postgres database target in your ~/.dbt/profiles.yml file:

warehouse:
  outputs:
    dev:
      type: postgres    # configure a target for Postgres
      host: localhost
      user: Drew
      ....
  run-target: dev

While Redshift is built on top of Postgres, the two are subtly different. For instance, Redshift supports sort and dist keys, while Postgres does not! dbt will use the database target type parameter to generate the appropriate SQL for the target database.

4. Root-level configs https://github.com/fishtown-analytics/dbt/issues/161

Configurations in dbt_project.yml can now be declared at the models: level. These configurations will apply to the primary project, as well as any dependency projects. This feature is particularly useful for setting pre- or post- hooks that run for every model. In practice, this looks like:

name: 'My DBT Project'

models:
    post-hook:
        - "grant select on {{this}} to looker_user"     # Applied to 'My DBT Project' and 'Snowplow' dependency
    'My DBT Project':
        enabled: true
    'Snowplow':
        enabled: true

5. --threads CLI option https://github.com/fishtown-analytics/dbt/issues/143

The number of threads that DBT uses can now be overridden with a CLI argument. The number of threads used must be between 1 and 8.

dbt run --threads 1    # fine
# or
dbt run --threads 4    # great
# or
dbt run --threads 42    # too many!

In addition to this new CLI argument, the output from multi-threaded dbt runs should be a little more orderly now. Models won't show as STARTed until they're actually queued to run. Previously, the output here was a little confusing. Happy threading!

Upgrading

To upgrade to version 0.5.1 of dbt, run:

pip install --upgrade dbt

And another thing

Join us on slack with questions or comments

Made with ♥️ by 🐟🏙 📈

0. tl;dr

use a temp table when executing incremental models
arbitrary configuration (using config variables)
specify branches for dependencies
more & better docs

1. new incremental model generation https://github.com/fishtown-analytics/dbt/issues/138

In previous versions of dbt, an edge case existed which caused the sql_where query to select different rows in the delete and insert steps. As a result, it was possible to construct incremental models which would insert duplicate records into the specified table. With this release, DBT uses a temp table which will 1) circumvent this issue and 2) improve query performance. For more information, check out the GitHub issue: https://github.com/fishtown-analytics/dbt/issues/138

2. Arbitrary configuration https://github.com/fishtown-analytics/dbt/issues/146

Configuration in dbt is incredibly powerful: it is what allows models to change their behavior without changing their code. Previously, all configuration was done using built-in parameters, but that actually limits the user in the power of configuration.

With this release, you can inject variables from dbt_project.yml into your top-level and dependency models. In practice, variables work like this:

# dbt_project.yml

models:
  my_project:
    vars:
      exclude_ip: '192.168.1.1'

-- filtered_events.sql

-- source code
select * from public.events where ip_address != '{{ var("exclude_ip") }}'

-- compiles to
select * from public.events where ip_address != '192.168.1.1'

The vars parameter in dbt_project.yml is compiled, so you can use jinja templating there as well! The primary use case for this is specifying "input" models to a dependency.

Previously, dependencies used ref(...) to select from a project's base models. That interface was brittle, and the idea that dependency code had unbridled access to all of your top-level models made us a little uneasy. As of this release, we're deprecating the ability for dependencies to ref(...) top-level models. Instead, the recommended way for this to work is with vars! An example:

-- dbt_modules/snowplow/models/events.sql

select * from {{ var('snowplow_events_table') }}

and

models:
  Snowplow:
    vars:
      snowplow_events_table: "{{ ref('base_events') }}"

This effectively mirrors the previous behavior, but it much more explicit about what's happening under the hood!

3. specify a dependency branch https://github.com/fishtown-analytics/dbt/pull/165

With this release, you can point DBT to a specific branch of a dependency repo. The syntax looks like this:

repositories:
    - https://github.com/fishtown-analytics/dbt-audit.git@development # use the "development" branch

4. More & Better Docs!

Check em out! And let us know if there's anything you think we can improve upon!

Upgrading

To upgrade to version 0.5.0 of dbt, run:

pip install --upgrade dbt

0. tl;dr

--version command
pre- and post- run hooks
windows support
event tracking

1. --version https://github.com/fishtown-analytics/dbt/issues/135

The --version command was added to help aid debugging. Further, organizations can use it to ensure that everyone in their org is up-to-date with dbt.

$ dbt --version
installed version: 0.4.7
   latest version: 0.4.7
Up to date!

2. pre-and-post-hooks https://github.com/fishtown-analytics/dbt/pull/147

With this release, you can now specify pre- and post- hooks that are run before and after a model is run, respectively. Hooks are useful for running grant statements, inserting a log of runs into an audit table, and more! Here's an example of a grant statement implemented using a post-hook:

models:
  my_project:
    post-hook: "grant select on table {{this}} to looker_user"
    my_model:
       materialized: view
    some_model:
      materialized: table
      post-hook: "insert into my_audit_table (model_name, run_at) values ({{this.name}}, getdate())"

Hooks are recursively appended, so the my_model model will only receive the grant select... hook, whereas the some_model model will receive both the grant select... and insert into... hooks.

Finally, note that the grant statement uses the (hopefully familiar) {{this}} syntax whereas the insert statement uses the {{this.name}} syntax. When DBT creates a model:

A temp table is created
The original model is dropped
The temp table is renamed to the final model name

DBT will intelligently uses the right table/view name when you invoke {{this}}, but you have a couple of more specific options available if you need them:

{{this}} : "schema"."table__dbt_tmp"
{{this.schema}}: "schema"
{{this.table}}: "table__dbt_tmp"
{{this.name}}: "table"

3. Event tracking https://github.com/fishtown-analytics/dbt/issues/89

We want to build the best version of DBT possible, and a crucial part of that is understanding how users work with DBT. To this end, we've added some really simple event tracking to DBT (using Snowplow). We do not track credentials, model contents or model names (we consider these private, and frankly none of our business). This release includes basic event tracking that reports 1) when dbt is invoked 2) when models are run, and 3) basic platform information (OS + python version). The schemas for these events can be seen here

You can opt out of event tracking at any time by adding the following to the top of you ~/.dbt/profiles.yml file:

config:
    send_anonymous_usage_stats: False

4. Windows support https://github.com/fishtown-analytics/dbt/pull/154

dbt v0.4.1 provides improvements to incremental models, performance improvements, and ssh support for db connections.

0. tl;dr

slightly modified dbt command structure
unique_key setting for incremental models
connect to your db over ssh
no more model-defaults
multithreaded schema tests

If you encounter an SSL/cryptography error while upgrading to this version of dbt, check that your version of pip is up-to-date

pip install -U pip
pip install -U dbt

1. new dbt command structure https://github.com/fishtown-analytics/dbt/issues/109

# To run models
dbt run # same as before

# to dry-run models
dbt run --dry # previously dbt test

# to run schema tests
dbt test # previously dbt test --validate

2. Incremental model improvements https://github.com/fishtown-analytics/dbt/issues/101

Previously, dbt calculated "new" incremental records to insert by querying for rows which matched some sql_where condition defined in the model configuration. This works really well for atomic datasets like a clickstream event log -- once inserted, these records will never change. Other datasets, like a sessions table comprised of many pageviews for many users, can change over time. Consider the following scenario:

User 1 Session 1 Event 1 @ 12:00 User 1 Session 1 Event 2 @ 12:01 -- dbt run -- User 1 Session 1 Event 3 @ 12:02

In this scenario, there are two possible outcomes depending on the sql_where chosen: 1) Event 3 does not get included in the Session 1 record for User 1 (bad), or 2) Session 1 is duplicated in the sessions table (bad). Both of these outcomes are inadequate!

With this release, you can now add a unique_key expression to an incremental model config. Records matching the unique_key will be deleted from the incremental table, then inserted as usual. This makes it possible to maintain data accuracy without recalculating the entire table on every run.

The unique_key can be any expression which uniquely defines the row, eg:

sessions:
  materialized: incremental
  sql_where: "session_end_tstamp > (select max(session_end_tstamp) from {{this}})"
  unique_key: user_id || session_index

3. Run schema validations concurrently https://github.com/fishtown-analytics/dbt/issues/100

The threads run-target config now applies to schema validations too. Try it with dbt test

4. Connect to database over ssh https://github.com/fishtown-analytics/dbt/issues/93

Add an ssh-host parameter to a run-target to connect to a database over ssh. The ssh-host parameter should be the name of a Host in your ~/.ssh/config file more info

warehouse:
  outputs:
    dev:
      type: redshift
      host: my-redshift.amazonaws.com
      port: 5439
      user: my-user
      pass: my-pass
      dbname: my-db
      schema: dbt_dbanin
      threads: 8
      ssh-host: ssh-host-name  # <------ Add this line
  run-target: dev

Remove the model-defaults config https://github.com/fishtown-analytics/dbt/issues/111

The model-defaults config doesn't make sense in a dbt world with dependencies. To apply default configs to your package, add the configs immediately under the package definition:

models:
    My_Package:
        enabled: true
        materialized: table
        snowplow:
            ...

dbt v0.4.0

dbt v0.4.0 provides new ways to materialize models in your database.

0. tl;dr

new types of materializations: incremental and ephemeral
if upgrading, change materialized: true|false to materialized: table|view|incremental|ephemeral
optionally specify model configs within the SQL file

1. Feature: `{{this}}` template variable https://github.com/fishtown-analytics/dbt/issues/81

The {{this}} template variable expands to the name of the model being compiled. For example:

-- my_model.sql
select 'the fully qualified name of this model is {{ this }}'
-- compiles to
select 'the fully qualified name of this model is "the_schema"."my_model"'

2. Feature: `materialized: incremental` https://github.com/fishtown-analytics/dbt/pull/90

After initially creating a table, incremental models will insert new records into the table on subsequent runs. This drastically speeds up execution time for large, append-only datasets.

Each execution of dbt run will:

create the model table if it doesn't exist
insert new records into the table

New records are identified by a sql_where model configuration option. In practice, this looks like:

sessions:
    materialized: incremental
    sql_where: "session_start_time > (select max(session_start_time) from {{this}})"

There are a couple of new things here. Previously, materialized could either be set to true or false. Now, the valid options include view, table, incremental, and ephemeral (more on this last one below). Also note that incremental models generally require use of the {{this}} template variable to identify new records.

The sql_where field is supplied as a where condition on a subquery containing the model definition. This resultset is then inserted into the target model. This looks something like:

insert into schema.model (
    select * from (
        -- compiled model definition
    ) where {{sql_where}}
)

3. Feature: `materialized: ephemeral` https://github.com/fishtown-analytics/dbt/issues/78

Ephemeral models are injected as CTEs (with statements) into any model that references them. Ephemeral models are part of the dependency graph and generally function like any other model, except ephemeral models are not compiled to their own files or directly created in the database. This is useful for intermediary models which are shared by other downstream models, but shouldn't be queried directly from outside of dbt.

To make a model ephemeral:

employees:
    materialized: ephemeral

Suppose you wanted to exclude employees from your users table, but you don't want to clutter your analytics schema with an employees table.

-- employees.sql
select * from public.employees where is_deleted = false

-- users.sql
select *
from {{ref('users')}}
where email not in (select email from {{ref('employees')}})

The compiled SQL would look something like:

with __dbt__CTE__employees as (
  select * from public.employees where is_deleted = false
)
select *
from users
where email not in (select email from __dbt__CTE__employees)

Ephemeral models play nice with other ephemeral models, incremental models, and regular table/view models. Feel free to mix and match different materialization options to optimize for performance and simplicity.

4. Feature: In-model configs https://github.com/fishtown-analytics/dbt/issues/88

Configurations can now be specified directly inside of models. These in-model configs work exactly the same as configs inside of the dbt_project.yml file.

An in-model-config looks like this:

-- users.sql

-- python function syntax
{{ config(materialized="incremental", sql_where="id > (select max(id) from {{this}})") }}
-- OR json syntax
{{
    config({"materialized:" "incremental", "sql_where" : "id > (select max(id) from {{this}})"})
}}

select * from public.users

The config resolution order is:

dbt_project.yml model-defaults
in-model config
dbt_project.yml models config

5. Fix: dbt seed null values https://github.com/fishtown-analytics/dbt/issues/102

Previously, dbt seed would insert empty CSV cells as "None", whereas they should have been NULL. Not anymore!

dbt v0.3.0

Version 0.3.0 comes with the following updates:

1. Parallel model creation https://github.com/fishtown-analytics/dbt/pull/83

dbt will analyze the model dependency graph and can create models in parallel if possible. In practice, this can significantly speed up the amount of time it takes to complete dbt run. The number of threads dbt uses must be between 1 and 8. To configure the number of threads dbt uses, add the threads key to your dbt target in ~/.dbt/profiles.yml, eg:

user:
  outputs:
    my-redshift:
      type: redshift
      threads: 4         # execute up to 4 models concurrently
      host: localhost
      ...
  run-target: my-redshift

For a complete example, check out a sample profiles.yml file

2. Fail only within a single dependency chain https://github.com/fishtown-analytics/dbt/issues/63

If a model cannot be created, it won't crash the entire dbt run process. The errant model will fail and all of its descendants will be "skipped". Other models which do not depend on the failing model (or its descendants) will still be created.

3. Logging https://github.com/fishtown-analytics/dbt/issues/64, https://github.com/fishtown-analytics/dbt/issues/65

dbt will log output from the dbt run and dbt test commands to a configurable logging directory. By default, this directory is called logs/. The log filename is dbt.log and it is rotated on a daily basic. Logs are kept for 7 days.

To change the name of the logging directory, add the following line to your dbt_project.yml file:

log-path: "my-logging-directory" # will write logs to my-logging-directory/dbt.log

4. Minimize time models are unavailable in the database https://github.com/fishtown-analytics/dbt/issues/68

Previously, dbt would create models by:

dropping the existing model
creating the new model

This resulted in a significant amount of time in which the model was inaccessible to the outside world. Now, dbt creates models by:

creating a temporary model {model-name}__dbt_tmp
dropping the existing model
renaming the tmp model name to the actual model name

5. Arbitrarily deep nesting https://github.com/fishtown-analytics/dbt/issues/50

Previously, all models had to be located in a directory matching models/{model group}/{model_name}.sql. Now, these models can be nested arbitrarily deeply within a given dbt project. For instance, models/snowplow/sessions/transformed/transformed_sessions.sql is a totally valid model location with this release.

To configure these deeply-nested models, just nest the config options within the dbt_project.yml file. The only caveat is that you need to specify the dbt project name as the first key under the models object, ie:

models:
  'Your Project Name':
    snowplow:
      sessions:
        transformed:
          transformed_sessions:
            enabled: true

More information is available on the issue and in the sample dbt_project.yml file

6. don't try to create a schema if it already exists https://github.com/fishtown-analytics/dbt/issues/66

dbt run would execute create schema if not exists {schema}. This would fail if the dbt user didn't have sufficient permissions to create the schema, even if the schema already existed! Now, dbt checks for the schema existence and only attempts to create the schema if it doesn't already exist.

7. Semantic Versioning

The previous release of dbt was v0.2.3.0 which isn't a semantic version. This and all future dbt releases will conform to semantic version in the format `{major}.{minor}.{patch}`.

dbt v0.2.3.0

Version 0.2.3.0 of dbt comes with the following updates:

1. Fix: Flip referential integrity arguments (breaking)

Referential integrity validations in a schema.yml file were previously defined relative to the parent table:

account:
  constraints:
    relationships:
      - {from: id, to: people, field: account_id}

Now, these validations are specified relative to the child table

people:
  constraints:
    relationships:
      - {from: account_id, to: accounts, field: id}

For more information, run dbt test -h

2. Feature: seed tables from a CSV

Previously, auxiliary data needed to be shoehorned into a view comprised of union statements, eg.

select 22 as "type", 'Chat Transcript' as type_name, 'chatted via olark' as event_name union all
select 21, 'Custom Redirect', 'clicked a custom redirect' union all
select 6, 'Email', 'email sent' union all
...

That's not a scalable solution. Now you can load CSV files into your data warehouse:

Add a CSV file (with a header) to the data/ directory
Run dbt seed to create a table from the CSV file!
The table name with be the filename (sans .csv) and it will be placed in your run-target's schema

Subsequent calls to dbt seed will truncate the seeded tables (if they exist) and re-insert the data. If the table schema changes, you can run dbt seed --drop-existing to drop the table and recreate it.

For more information, run dbt seed -h

3. Feature: compile analytical queries

Versioning your SQL models with dbt is a great practice, but did you know that you can also version your analyses? Any SQL files in the analysis/ dir will be compiled (ie. table names will be interpolated) and placed in the target/build-analysis/ directory. These analytical queries will not be run against your data warehouse with dbt run -- you should copy/paste them into the data analysis tool of your choice.

4. Feature: accepted values validation

In your schema.yml file, you can now add accepted-values validations:

accounts:
  constraints:
    accepted-values:
      - {field: type, values: ['paid', 'free']}

This test will determine how many records in the accounts model have a type other than paid or free.

5. Feature: switch profiles and targets on the command line

Switch between profiles with --profile [profile-name] and switch between run-targets with --target [target-name].

Targets should be something like "prod" or "dev" and profiles should be something like "my-org" or "my-side-project"

side-project:
  outputs:
    prod:
      type: redshift
      host: localhost
      port: 5439
      user: Drew
      pass:
      dbname: data_generator
      schema: ac_drew
    dev:
      type: redshift
      host: localhost
      port: 5439
      user: Drew
      pass:
      dbname: data_generator
      schema: ac_drew_dev
  run-target: dev

To compile models using the dev environment of my side-project profile: $ dbt compile --profile side-project --target dev or for prod: $ dbt compile --profile side-project --target prod

You can also add a "profile' config to the dbt_config.yml file to fix a dbt project to a specific profile:

...
test-paths: ["test"]
data-paths: ["data"]

# Fix this project to the "side-project" profile
# You can still use --target to switch between environments!
profile: "side-project"

model-defaults:
....

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

dbt 0.15.0 (November 25, 2019)

Breaking changes

Installation notes:

Core

Features

Fixes

Docs

Server

Postgres / Redshift

Snowflake

BigQuery

Under the Hood

Dependencies

Everything else

Contributors

dbt 0.14.4 (November 8, 2019)

Fixes:

dbt 0.14.3 (October 10, 2019)

Fixes:

Under the hood:

dbt 0.14.2 (September 13, 2019)

Overview

Fixes:

dbt 0.14.1 (September 3, 2019)

Overview

Breaking changes

Features:

Fixes:

Contributors:

dbt 0.14.0 - Wilt Chamberlain (July 10, 2019)

Overview

Breaking changes

Features

dbt docs Changes

Fixes

Under the hood

Contributors:

dbt 0.13.1 (May 13, 2019)

Overview

Bugfixes

dbt Docs

Contributors:

dbt 0.13.0 - Stephen Girard (March 21, 2019)

Overview

Breaking Changes

Deprecations

Features

Fixes

Under the hood

Contributors:

dbt 0.12.2 - Grace Kelly (January 8, 2019)

Overview

Deprecations

Features

Fixes

dbt 0.12.1 - (November 15, 2018)

Overview

Fixes

dbt 0.12.0 - Guion Bluford (November 12, 2018)

Overview

Breaking Changes

tl;dr

Features

Fixes

Under the hood

Contributors

dbt 0.11.1 - Lucretia Mott (September 18, 2018)

Overview

Features

Fixes

dbt 0.11.0 - Isaac Asimov (September 6, 2018)

Overview

Breaking Changes

Features

1. Adapter functions must be namespaced to the `adapter` context variable