- Support for Python 2.x has been dropped as it will no longer be supported on January 1, 2020
- Compilation errors in .yml files are now treated as errors instead of warnings (#1493, #1751)
- The 'table_name' field field has been removed from Relations
- The existing
compileandexecuterpc tasks have been renamed tocompile_sqlandexecute_sql(#1779, #1798) (docs) - Custom materializations must now manage dbt's Relation cache (docs)
dbt v0.15.0 uses the psycopg2-binary dependency (instead of psycopg2) to simplify installation on platforms that do not have a compiler toolchain installed. If you experience segmentation faults, crashes, or installation errors, you can set the DBT_PSYCOPG2_NAME environment variable to psycopg2 to change the dependency that dbt installs. This may require a compiler toolchain and development libraries.
$ DBT_PSYCOPG2_NAME=psycopg2 pip install dbtYou may also install specific dbt plugins directly by name. This has the advantage of only installing the Python requirements needed for your particular database:
$ pip install dbt-postgres
$ pip install dbt-redshift
$ pip install dbt-snowflake
$ pip install dbt-bigquery- Add a JSON logger (#1237, #1791) (docs)
- Add structured logging to dbt (#1704, #1799, #1715, #1806)
- Add partial parsing option to the profiles.yml file (#1835, #1836, #1487) (docs)
- Support configurable query comments in SQL queries (#1643, #1864) (docs)
- Support atomic full-refreshes for incremental models (#525, #1682)
- Support snapshot configs in dbt_project.yml (#1613, #1759) (docs)
- Support cache modifications in materializations (#1683, #1770) (docs)
- Support
quoteparameter to Accepted Values schema tests (#1873, #1876) (docs) - Support Python 3.8 (#1886)
- Support filters in sources for
dbt source snapshot-freshnessinvocation (#1495, #1776) (docs) - Support external table configuration in yml source specifications (#1784)
- Improve CLI output when running snapshots (#1768, #1769)
- Fix for unhelpful error message for malformed source/ref inputs (#1660, #1809)
- Fix for lingering backup tables when incremental models are full-refreshed (#1933, #1931)
- Fix for confusing error message when errors are encountered during compilation (#1807, #1839)
- Fix for logic error affecting the two-argument flavor of the
reffunction (#1504, #1515) - Fix for invalid reference to dbt.exceptions (#1569, #1609)
- Fix for "cannot run empty query" error when pre/post-hooks are empty (#1108, #1719)
- Fix for confusing error when project names shadow context attributes (#1696, #1748)
- Fix for incorrect database logic in docs generation which resulted in columns being "merged" together across tables (#1708, #1774)
- Fix for seed errors located in dependency packages (#1723, #1723)
- Fix for confusing error when schema tests return unexpected results (#1808, #1903)
- Fix for twice-compiled
statementblock contents (#1717, #1719) - Fix for inaccurate output in
dbt run-operation --help(#1767, #1777) - Fix for file rotation issues concerning the
logs/dbt.logfile (#1863, #1865, #1871) - Fix for missing quotes in incremental model build queries (#1847, #1888)
- Fix for incorrect log level in
printer.print_run_result_error(#1818, #1823)
- Show seeds and snapshots in the Project and Database views (docs#37, docs#25, docs#52)
- Show sources in the Database tree view (docs#20, docs#52)
- Show edges in the DAG between models and seeds (docs#15, docs#52)
- Show Accepted Values tests and custom schema tests in the column list for models (docs#52)
- Fix links for "Refocus on node" and "View documentation" in DAG context menu for seeds (docs#52)
- Support docs generation (#1781, #1801)
- Support custom tags (#1822, #1828)
- Support invoking
depson the rpc server (#1834, #1837) - Support invoking
run-operationandsnapshoton the rpc server (#1875, #1878) - Suppport
--threadsargument tocli_argsmethod (#1897, #1909) - Support reloading the manifest when a SIGHUP signal is received (#1684, #1699)
- Support invoking
compile,run,test, andseedon the rpc server (#1488, #1652) - Support returning compilation logs from the last compile in the
statusmethod (#1703, #1775) - Support asyncronous
compile_sqlandrun_sqlmethods (#1706, #1735) - Improve re-compilation performance (#1824, #1830)
- Support running dbt against schemas which contain materialized views on Postgres (#1698, #1833)
- Support distyle AUTO in Redshift model configs (#1882, #1885) (docs)
- Fix for internal errors when run against mixed-case logical databases (#1800, #1936)
- Support
copy grantsoption in Snowflake model configs (#1744, #1747) (docs) - Support warehouse configuration in Snowflake model configs (#1358, #1899, #1788, #1901) (docs)
- Support secure views in Snowflake model configs (#1730, #1743) (docs)
- Fix for unclosed connections preventing dbt from exiting when Snowflake is used with client_session_keep_alive (#1271, #1749)
- Fix for errors on Snowflake when dbt schemas contain
LOCAL TEMPORARYtables (#1869, #1872)
- Support KMS Encryption in BigQuery model configs (#1829, #1851) (docs)
- Improve docs generation speed by leveraging the information schema (#1576, #1795)
- Fix for cache errors on BigQuery when dataset names are capitalized (#1810, #1881)
- Fix for invalid query generation when multiple
optionsare provided to acreate table|viewquery (#1786, #1787) - Use
client.delete_datasetto drop BigQuery datasets atomically (#1887, #1881)
- Drop support for
networkx 1.x(#1577, #1814) - Upgrade
werkzeugto 0.15.6 (#1697, #1814) - Pin
psycopg2dependency to 2.8.x to prevent segfaults (#1221, #1898) - Set a strict upper bound for
jsonschemadependency (#1817, #1821, #1932)
- Provide test names and kwargs in the manifest (#1154, #1816)
- Replace JSON Schemas with data classes (#1447, #1589)
- Include test name and kwargs in test nodes in the manifest (#1154, #1816)
- Remove logic around handling
archiveblocks in thedbt_project.ymlfile (#1580, #1581) - Remove the APIObject class (#1762, #1780)
Thanks all for your contributions to dbt! 🎉
- @captainEli (#1809)
- @clausherther (#1876)
- @jtcohen6 (#1784)
- @tbescherer (#1515)
- @aminamos (#1609)
- @JusLarsen (#1903)
- @heisencoder (#1823)
- @tjengel (#1885)
- @Carolus-Holman (#1747, #1743)
- @kconvey (#1851)
- @darrenhaken (#1787)
This release changes the version ranges of some of dbt's dependencies. These changes address installation issues in 0.14.3 when dbt is installed from pip. You can view the full list of dependency version changes in this commit.
Note: If you are installing dbt into an environment alongside other Python libraries, you can install individual dbt plugins with:
pip install dbt-postgres
pip install dbt-redshift
pip install dbt-snowflake
pip install dbt-bigquery
Installing specific plugins may help mitigate issues regarding incompatible versions of dependencies between dbt and other libraries.
This is a bugfix release.
- Fix for
dictionary changed size during iterationrace condition (#1740, #1750) - Fix upper bound on jsonschema dependency to 3.1.1 (#1817, #1819)
This is a bugfix release.
- Fix for dbt hanging at the end of execution in
dbt source snapshot-freshnesstasks (#1728, #1729) - Fix for broken "packages" and "tags" selector dropdowns in the dbt Documentation website (docs#47, #1726)
This is primarily a bugfix release which contains a few minor improvements too. Note: this release includes an important change in how the check snapshot strategy works. See #1614 for more information. If you are using snapshots with the check strategy on dbt v0.14.0, it is strongly recommended that you upgrade to 0.14.1 at your soonest convenience.
- The undocumented
macrosattribute was removed from thegraphcontext variable (#1615)
- Summarize warnings at the end of dbt runs (#1597, #1654)
- Speed up catalog generation on postgres by using avoiding use of the
information_schema(#1540) - Docs site updates (#1621)
- Add environment variables for macro debugging flags (#1628, #1629)
- Speed up node selection by making it linear, rather than quadratic, in complexity (#1611, #1615)
- Specify the
applicationfield in Snowflake connections (#1622, #1623) - Add support for clustering on Snowflake (#634, #1591, #1689) (docs)
- Add support for job priority on BigQuery (#1456, #1673) (docs)
- Add
node.configandnode.tagsto thegenerate_schema_nameandgenerate_alias_namemacro context (#1700, #1701)
- Fix for reused
check_colsvalues in snapshots (#1614, #1709) - Fix for rendering column descriptions in sources (#1619, #1633)
- Fix for
is_incremental()returning True for models that are not materialized as incremental models (#1249, #1608) - Fix for serialization of BigQuery results which contain nested or repeated records (#1626, #1638)
- Fix for loading seed files which contain non-ascii characters (#1632, #1644)
- Fix for creation of user cookies in incorrect directories when
--profile-diror$DBT_PROFILES_DIRis provided (#1645, #1656) - Fix for error handling when transactions are being rolled back (#1647)
- Fix for incorrect references to
dbt.exceptionsin jinja code (#1569, #1609) - Fix for duplicated schema creation due to case-sensitive comparison (#1651, #1663)
- Fix for "schema stub" created automatically by dbt (#913, #1663)
- Fix for incremental merge query on old versions of postgres (<=9.6) (#1665, #1666)
- Fix for serializing results of queries which return
TIMESTAMP_TZcolumns on Snowflake in the RPC server (#1670) - Fix typo in InternalException (#1640, #1672)
- Fix typo in CLI help for snapshot migration subcommand (#1664)
- Fix for error handling logic when empty queries are submitted on Snowflake (#1693, #1694)
- Fix for non-atomic column expansion logic in Snowflake incremental models and snapshots (#1687, #1690)
- Fix for unprojected
count(*)expression injected by custom data tests (#1688) - Fix for
dbt runanddbt docs generatecommands when running against Panoply Redshift (#1479, #1686)
Thanks for your contributions to dbt!
- @levimalott (#1647)
- @aminamos (#1609)
- @elexisvenator (#1540)
- @edmundyan (#1663)
- @vitorbaptista (#1664)
- @sjwhitworth (#1672, #1673)
- @mikaelene (#1688, #1709)
- @bastienboutonnet (#1591, #1689)
- Replace Archives with Snapshots (docs, migration guide)
- Add three new top-level commands:
- Support the specification of severity levels for schema and data tests (docs)
- Many new quality of life improvements and bugfixes
- Stub out adapter methods at parse-time to speed up parsing (#1413)
- Removed support for the
--non-destructiveflag (#1419, #1415) - Removed support for the
sql_whereconfig to incremental models (#1408, #1351) - Changed
expand_target_column_typesto take a Relation instead of a string (#1478) - Replaced Archives with Snapshots
- Add
run-operationcommand which invokes macros directly from the CLI (#1328) (docs) - Add a
dbt lscommand which lists resources in your project (#1436, #467) (docs) - Add Snapshots, an improvement over Archives (#1361, #1175) (docs)
- Add an RPC server via
dbt rpc(#1301, #1274) (docs) - Made printer width configurable (#1026, #1247) (docs)
- Retry package downloads from the hub.getdbt.com (#1451, #1491)
- Add a test "severity" level, presented as a keyword argument to schema tests (#1410, #1005) (docs)
- Add a
generate_alias_namemacro to configure alias names dynamically (#1363) (docs) - Add a
nodeargument togenerate_schema_nameto configure schema names dynamically (#1483, #1463) (docs) - Use
create or replaceon Snowflake to rebuild tables and views atomically (#1101, #1409) - Use
mergestatement for incremental models on Snowflake (#1414, #1307, #1409) (docs) - Add support seed CSV files that start with a UTF-8 Byte Order Mark (BOM) (#1452, #1177)
- Add a warning when git packages are not pinned to a version (#1453, #1446)
- Add logging for
on-run-startandon-run-end hooksto console output (#1440, #696) - Add modules and tracking information to the rendering context for configuration files (#1441, #1320)
- Add support for
nullvars, and distinguishnullvars from unset vars (#1426, #608) - Add support for the
search_pathconfiguration in Postgres/Redshift profiles (#1477, #1476) (docs (postgres), docs (redshift)) - Add support for persisting documentation as
descriptionsfor tables and views on BigQuery (#1031, #1285) (docs) - Add a
--project-dirpath which will invoke dbt in the specified directory (#1549, #1544)
- Add searching by tag name (#32)
- Add context menu link to export graph viz as a PNG (#34)
- Fix for clicking models in left-nav while search results are open (#31)
- Fix for unduly long timeouts when anonymous event tracking is blocked (#1445, #1063)
- Fix for error with mostly-duplicate git urls in packages, picking the one that came first. (#1428, #1084)
- Fix for unrendered
descriptionfield as jinja in top-level Source specification (#1484, #1494) - Fix for API error when very large temp tables are created in BigQuery (#1423, #1478)
- Fix for compiler errors that occurred if jinja code was present outside of a docs blocks in .md files (#1513, #988)
- Fix
TEXThandling on postgres and redshift (#1420, #781) - Fix for compiler error when vars are undefined but only used in disabled models (#1429, #434)
- Improved the error message when iterating over the results of a macro that doesn't exist (#1425, #1424)
- Improved the error message when tests have invalid parameter definitions (#1427, #1325)
- Improved the error message when a user tries to archive a non-existent table (#1361, #1066)
- Fix for archive logic which tried to create already-existing destination schemas (#1398, #758)
- Fix for incorrect error codes when Operations exit with an error (#1406, #1377)
- Fix for missing compiled SQL when the rpc server encounters a database error (#1381, #1371)
- Fix for broken link in the profile.yml generated by
dbt init(#1366, #1344) - Fix the sample test.env file's redshift password field (#1364)
- Fix collisions on models running concurrently that have duplicate names but have distinguishing aliases (#1342, #1321)
- Fix for a bad error message when a
versionis missing from a package spec inpackages.yml(#1551, #1546) - Fix for wrong package scope when the two-arg method of
refis used (#1515, #1504) - Fix missing import in test suite (#1572)
- Fix for a Snowflake error when an external table exists in a schema that dbt operates on (#1571, #1505)
- Use pytest for tests (#1417)
- Use flake8 for linting (#1361, #1333)
- Added a flag for wrapping models and tests in jinja blocks (#1407, #1400)
- Connection management: Bind connections threads rather than to names (#1336, #1312)
- Add deprecation warning for dbt users on Python2 (#1534, #1531)
- Upgrade networkx to v2.x (#1509, #1496)
- Anonymously track adapter type and rpc requests when tracking is enabled (#1575, #1574)
- Fix for test warnings and general test suite cleanup (#1578)
Over a dozen contributors wrote code for this release of dbt! Thanks for taking the time, and nice work y'all! :)
- @nydnarb (#1363)
- @emilieschario (#1366)
- @bastienboutonnet (#1409)
- @kasanchez (#1247)
- @Blakewell (#1307)
- @buremba (#1476)
- @darrenhaken (#1285)
- @tbescherer (#1504)
- @heisencoder (#1509, #1549. #1578)
- @cclauss (#1572)
- @josegalarza (#1571)
- @rmgpinto (docs#31, docs#32)
- @groodt (docs#34)
This is a bugfix release.
- Add "MaterializedView" relation type to the Snowflake adapter (#1430, #1432) (@adriank-convoy)
- Quote databases properly (#1396, #1402)
- Use "ilike" instead of "=" for database equality when listing schemas (#1411, #1412)
- Pass the model name along in get_relations (#1384, #1388)
- Add logging to dbt clean (#1261, #1383, #1391) (@emilieschario)
- Search by columns (dbt-docs#23) (rmgpinto)
- Support @ selector (dbt-docs#27)
- Fix number formatting on Snowflake and BQ in table stats (dbt-docs#28)
Thanks for your contributions to dbt!
This release provides a stable API for building new adapters and reimplements dbt's adapters as "plugins". Additionally, a new adapter for Presto was added using this architecture. Beyond adapters, this release of dbt also includes Sources which can be used to document and test source data tables. See the full list of features added in 0.13.0 below.
- version 1 schema.yml specs are no longer implemented. Please use the version 2 spec instead (migration guide)
{{this}}is no longer implemented foron-run-startandon-run-endhooks. Use{{ target }}or anon-run-endcontext variable instead (#1176, implementing #878)- A number of materialization-specific adapter methods have changed in breaking ways. If you use these adapter methods in your macros or materializations, you may need to update your code accordingly.
- query_for_existing - removed, use get_relation instead.
- get_missing_columns - changed to take
Relations instead of schemas and identifiers - expand_target_column_types - changed to take a
Relationinstead of schema, identifier - get_relation - added a
databaseargument - create_schema - added a
databaseargument - drop_schema - added a
databaseargument
- The following adapter methods are now deprecated, and will be removed in a future release:
- get_columns_in_table - deprecated in favor of get_columns_in_relation
- already_exists - deprecated in favor of get_relation
- Add
sources to dbt, use them to calculate source data freshness (docs ) (#814, #1240) - Add support for Presto (docs, repo) (#1106)
- Add
require-dbt-versionoption todbt_project.ymlto state the supported versions of dbt for packages (docs) (#581) - Add an output line indicating the installed version of dbt to every run (#1134)
- Add a new model selector (
@) which build models, their children, and their children's parents (docs) (#1156) - Add support for Snowflake Key Pair Authentication (docs) (#1232)
- Support SSO Authentication for Snowflake (docs) (#1172)
- Add support for Snowflake's transient tables (docs) (#946)
- Capture build timing data in
run_results.jsonto visualize project performance (#1179) - Add CLI flag to toggle warnings as errors (docs) (#1243)
- Add tab completion script for Bash (docs) (#1197)
- Added docs on how to build a new adapter (docs) (#560)
- Use new logo (#1349)
- Fix for Postgres character columns treated as string types (#1194)
- Fix for hard to reach edge case in which dbt could hang (#1223)
- Fix for
dbt depsin non-English shells (#1222) - Fix for over eager schema creation when models are run with
--models(#1239) - Fix for
dbt seed --show(#1288) - Fix for
is_incremental()which should only returnTrueif the target relation is atable(#1292) - Fix for error in Snowflake table materializations with custom schemas (#1316)
- Fix errored out concurrent transactions on Redshift and Postgres (#1356)
- Fix out of order execution on model select (#1354, #1355)
- Fix adapter macro namespace issue (#1352, #1353)
- Re-add CLI flag to toggle warnings as errors (#1347)
- Fix release candidate regression that runs run hooks on test invocations (#1346)
- Fix Snowflake source quoting (#1338, #1317, #1332)
- Handle unexpected max_loaded_at types (#1330)
- Replace all SQL in Python code with Jinja in macros (#1204)
- Loosen restrictions of boto3 dependency (#1234)
- Rewrote Postgres introspective queries to be faster on large databases (#1192
Thanks for your contributions to dbt!
- @patrickgoss #1193
- @brianhartsock #1191
- @alexyer #1232
- @adriank-convoy #1224
- @mikekaminsky #1216
- @vijaykiran #1198, #1199
This release reduces the runtime of dbt projects by improving dbt's approach to model running. Additionally, a number of workflow improvements have been added.
- More intelligently order and execute nodes in the graph. This significantly speeds up the runtime of most dbt projects (#813)
- Add
-mflag as an alias for--models(#1160) - Add
post_hookandpre_hookas aliases forpost-hookandpre-hook, respectively (#1124) (docs) - Better handling of git errors in
dbt deps+ full support for Windows (#994, #778, #895) - Add support for specifying a
locationin BigQuery datasets (#969) (docs) - Add support for Jinja expressions using the
{% do ... %}block (#1113) - The
dbt debugcommand is actually useful now (#1061) - The
configfunction can now be called multiple times in a model (#558) - Source the latest version of dbt from PyPi instead of GitHub (#1122)
- Add a peformance profiling mechnanism to dbt (#1001)
- Add caching for dbt's macros-only manifest to speedup parsing (#1098)
- Fix for custom schemas used alongside the
generate_schema_namemacro (#801) - Fix for silent failure of tests that reference nonexistent models (#968)
- Fix for
generate_schema_namemacros that return whitespace-padded schema names (#1074) - Fix for incorrect relation type for backup tables on Snowflake (#1103)
- Fix for incorrectly cased values in the relation cache (#1140)
- Fix for JSON decoding error on Python2 installed with Anaconda (#1155)
- Fix for unhandled exceptions that occur in anonymous event tracking (#1180)
- Fix for analysis files that contain
rawtags (#1152) - Fix for packages which reference the hubsite (#1095)
This is a bugfix release.
- Fix for relation caching when views outside of a dbt schema depend on relations inside of a dbt schema (#1119)
This release adds caching for some introspective queries on all adapters. Additionally, custom tags can be supplied for models, along with many other minor improvements and bugfixes.
- Support for the
repositories:block indbt_project.yml(deprecated in 0.10.0) was removed.
- Make runs faster by caching introspective queries
- Support model tags
- Add a list of schemas to the
on-run-endcontext - Set your profiles directory with an environment variable
- Cache the existence of relations to speed up dbt runs (#1025)
- Add support for tag configuration and selection (#1014)
- Add tags to the model and graph views in the docs UI (#7)
- Add the set of schemas that dbt built models into in the
on-run-endhook context (#908) - Warn for unused resource config paths in dbt_project.yml (#725)
- Add more information to the
dbt --helpoutput (#1058) - Add support for configuring the profiles directory with an env var (#1055)
- Add support for cli and env vars in most
dbt_project.ymlandprofiles.ymlfields (#1033) - Provide a better error message when seed file loading fails on BigQuery (#1079)
- Improved error handling and messaging on Redshift (#997)
- Include datasets with underscores when listing BigQuery datasets (#954)
- Forgo validating the user's profile for
dbt depsanddbt cleancommands (#947, #1022) - Don't read/parse CSV files outside of the
dbt seedcommand (#1046)
- Fix for incorrect model selection with the
--modelsCLI flag when projects and directories share the same name (#1023) - Fix for table clustering configuration with multiple columns on BigQuery (#1013)
- Fix for incorrect output when a single row fails validation in
dbt test(#1040) - Fix for unwieldly Jinja errors regarding undefined variables at parse time (#1086, #1080, #935)
- Fix for incremental models that have a line comment on the last line of the file (#1018)
- Fix for error messages when ephemeral models fail to compile (#1053)
- Create adapters as singleton objects instead of classes (#961)
- Combine project and profile into a single, coherent object (#973)
- Investigate approaches for providing more complete compilation output (#588)
Thanks for contributing!
- @mikekaminsky (#1049, #1060)
- @joshtemple (#1079)
- @k4y3ff (#954)
- @elexisvenator (#1019)
- @clrcrl (#725
This is a patch release containing a few bugfixes and one quality of life change for dbt docs.
- dbt
- Add
--portparameter to dbt docs serve (#987)
- Add
- dbt
- Fix hooks in model configs not running (#985)
- Fix integration test on redshift catalog generation (#977)
- Snowflake: Fix docs generation errors when QUOTED_IDENTIFIER_IGNORE_CASE is set (#998)
- Translate empty strings to null in seeds (#995)
- Filter out null schemas during catalog generation (#992)
- Fix quoting on drop, truncate, and rename (#991)
- dbt-docs
This release adds support for auto-generated dbt documentation, adds a new syntax for schema.yml files, and fixes a number of minor bugs. With the exception of planned changes to Snowflake's default quoting strategy, this release should not contain any breaking changes. Check out the blog post for more information about this release.
- Add autogenerated dbt project documentation (docs) (#375, #863, #941, #815)
- Version 2 of schema.yml, which allows users to create table and column comments that end up in the manifest (docs) (#880)
- Extend catalog and manifest to also support Snowflake, BigQuery, and Redshift, in addition to existing Postgres support (#866, #857, #849)
- Add a 'generated_at' field to both the manifest and the catalog. (#887)
- Add
docsblocks that users can put into.mdfiles anddoc()value for schema v2 description fields (#888) - Write out a 'run_results.json' after dbt invocations. (#904)
- Type inference for interpreting CSV data is now less aggressive (#905)
- Remove distinction between
this.tableandthis.schemaby refactoring materialization SQL (#940)
- Fix for identifier clashes in BigQuery merge statements (#914)
- Fix for unneccessary downloads of
bumpversion.cfg, handle failures gracefully (#907) - Fix for incompatible
boto3requirements (#959) - Fix for invalid
relationshipstest when the parent column contains null values (#921)
Thanks for contributing!
- @rsmichaeldunn (#799)
- @lewish (#915)
- @MartinLue (#872)
This release makes it possible to alias relation names, rounds out support for BigQuery with incremental, archival, and hook support, adds the IAM Auth method for Redshift, and builds the foundation for autogenerated dbt project documentation, to come in the next release.
Additionally, a number of bugs have been fixed including intermittent BigQuery 404 errors, Redshift "table dropped by concurrent query" errors, and a probable fix for Redshift connection timeout issues.
We want to extend a big thank you to our outside contributors for this release! You all are amazing.
- @danielchalef (#818)
- @mjumbewu (#796)
- @abelsonlive (#800)
- @jon-rtr (#800)
- @mturzanska (#797)
- @cpdean (#780)
- BigQuery
- Redshift: IAM Auth (#818) (docs)
- Model aliases (#800)(docs)
- Write JSON manifest file to disk during compilation (#761)
- Add forward and backward graph edges to the JSON manifest file (#762)
- Add a 'dbt docs generate' command to generate a JSON catalog file (#774, #808)
- BigQuery: fix concurrent relation loads (#835)
- BigQuery: support external relations (#828)
- Redshift: set TCP keepalive on connections (#826)
- Redshift: fix "table dropped by concurrent query" (#825)
- Fix the error handling for profiles.yml validation (#820)
- Make the
--threadsparameter actually change the number of threads used (#819) - Ensure that numeric precision of a column is not
None(#796) - Allow for more complex version comparison (#797)
- Use a subselect instead of CTE when building incremental models (#787)
- Internals
- Improved dependency selection, rip out some unused dependencies (#848)
- Stop tracking
run_errorin tracking code (#817) - Use Mapping instead of dict as the base class for APIObject (#756)
- Split out parsers (#809)
- Fix
__all__parameter in submodules (#780) - Switch to CircleCI 2.0 (#843, #850)
- Added tox environments that have the user specify what tests should be run (#837)
This release focuses on achieving functional parity between all of dbt's adapters. With this release, most dbt functionality should work on every adapter except where noted here.
- Configure model schema and name quoting in your
dbt_project.ymlfile (Docs) - Add a
Relationobject to the context to simplify model quoting Docs - Implement BigQuery materializations using new
create table as (...)syntax, supportpartition byclause (Docs) - Override seed column types (Docs)
- Add
get_columns_in_tablecontext function for BigQuery (Docs)
- Consistent schema and identifier quoting (#727)
- Use the new
create table as (...)syntax on BigQuery (#717)- Support
partition byclause
- Support
- CSV Updates:
- Improve
get_columns_in_tablecontext function (#709)- Support numeric types on Redshift, Postgres
- Support BigQuery (including nested columns in
structtypes) - Support cross-database
information_schemaqueries for Snowflake - Retain column ordinal positions
- Fix for incorrect var precendence when using
--varson the CLI (#739) - Fix for closed connections in
on-run-endhooks for long-running dbt invocations (#693) - Fix: don't try to run empty hooks (#620, #693)
- Fix: Prevent seed data from being serialized into
graph.gpicklefile (#720) - Fix: Disallow seed and model files with the same name (#737)
This release overhauls dbt's package management functionality, makes seeding csv files work across all adapters, and adds date partitioning support for BigQuery.
- Check out full installation and upgrading instructions here
- Transition the
repositories:section of yourdbt_project.ymlfile to apackages.ymlfile as described here - You may need to clear out your
dbt_modulesdirectory if you use packages like dbt-utils. Depending how your project is configured, you can do this by runningdbt clean. - We're using a new CSV parsing library,
agate, so be sure to check that all of your seed tables are parsed as you would expect!
- Support for variables defined on the CLI with
--vars(#640) (docs) - Improvements to
dbt seed(docs)- Support seeding csv files on all adapters (#618)
- Make seed csv's
ref()-able in models (#668) - Support seed file configuration (custom schemas, enabled / disabled) in the
dbt_project.ymlfile (#561) - Support
--full-refreshinstead of--drop-existing(deprecated) for seed files (#515) - Add
--showargument todbt seedto display a sample of data in the CLI (#74)
- Improvements to package management (docs)
- Support date partitioning in BigQuery (#641) (docs)
- Move schema creation to after
on-run-starthooks (#652) - Replace
csvkitdependency withagate(#598) - Switch snowplow endpoint to pipe directly to Fishtown Analytics (#682)
- Throw a compilation exception if a required test macro is not present in the context (#655)
- Make the
adapter_macrouse thereturn()function (#635) - Fix bug for introspective query on late binding views (redshift) (#647)
- Disable any non-dbt log output on the CLI (#663)
This release fixes bugs and adds supports for late binding views on Redshift.
- Support late binding views on Redshift (#614) (docs)
- Make
run_started_attimezone-aware (#553) (Contributed by @mturzanska) (docs)
- Include hook run time in reported model run time (#607)
- Add warning for missing test constraints (#600)
- Fix for schema tests used or defined in packages (#599)
- Run hooks in defined order (#601)
- Skip tests that depend on nonexistent models (#617)
- Fix for
adapter_macrocalled within a package (#630)
This release focuses on improvements to macros, materializations, and package management. Check out the blog post to learn more about what's possible in this new version of dbt.
Full installation instructions for macOS, Windows, and Linux can be found here. If you use Windows or Linux, installation works the same as with previous versions of dbt. If you use macOS and Homebrew to install dbt, note that installation instructions have changed:
brew update
brew tap fishtown-analytics/dbt
brew install dbt- More powerful macros and materializations
- Custom model schemas
- BigQuery improvements
- Bugfixes
- Documentation (0.9.0 docs can be found here)
adapterfunctions must be namespaced to theadaptercontext variable. To fix this error, useadapter.already_existsinstead of justalready_exists, or similar for other adapter functions.
- use
adapter,ref, andvarinside of macros (#466) - Build custom tests and materializations in dbt packages (#466)
- Support pre- and post- hooks that run outside of a transaction (#510)
- Support table materializations for BigQuery (#507)
- Support querying external data sources in BigQuery (#507)
- Override which schema models are materialized in (#522) (docs)
- Make
{{ ref(...) }}return the same type of object as{{ this }}(#530) - Replace schema test CTEs with subqueries to speed them up for Postgres (#536) (@ronnyli)
- Bump Snowflake dependency, remove pyasn1 (#570)
- Document how to create a package
- Document how to make a materialization
- Document how to make custom schema tests
- Document how to use hooks to vacuum
- Document all context variables
- Bump Snowflake dependency, remove pyasn1 (#570)
- Fix for federated queries on BigQuery with Service Account json credentials (#547)
- Bugfixes
- Faster schema tests on Postgres
- Fix for broken environment variables
- Fix broken integration tests (#539)
- Fix for
--non-destructiveon views (#539) - Fix for package models materialized in the wrong schema (#538)
- Fix for broken environment variables (#543)
- Custom model schemas
- BigQuery updates
refimprovements
- Parity for
statementinterface on BigQuery (#526)
- Override which schema models are materialized in (#522) (docs)
- Make
{{ ref(...) }}return the same type of object as{{ this }}(#530)
- More powerful macros
- BigQuery improvements
- Bugfixes
- Documentation (0.9.0 docs can be found here)
dbt 0.9.0 Alpha 1 introduces a number of new features intended to help dbt-ers write flexible, reusable code. The majority of these changes involve the macro and materialization Jinja blocks. As this is an alpha release, there may exist bugs or incompatibilites, particularly surrounding these two blocks. A list of known breaking changes is provided below. If you find new bugs, or have questions about dbt 0.9.0, please don't hesitate to reach out in slack or open a new issue.
This will manifest as a compilation error that looks like:
Compilation Error in model {your_model} (models/path/to/your_model.sql)
'already_exists' is undefined
To fix this error, use adapter.already_exists instead of just already_exists, or similar for other adapter functions.
- use
adapter,ref, andvarinside of macros (#466) - Build custom tests and materializations in dbt packages (#466)
- Support pre- and post- hooks that run outside of a transaction (#510)
- Support table materializations for BigQuery (#507)
- Support querying external data sources in BigQuery (#507)
- Document how to create a package
- Document how to make a materialization
- Document how to make custom schema tests
- Add suppport for Google BigQuery
- Significant exit codes
- Load credentials from environment variables
- Fix errant warning for
dbt archivecommands (#476) - Show error (instead of backtrace) for failed hook statements (#478)
dbt initno longer leaves the repo in an invalid state (#487)- Fix bug which ignored git tag specs for package repos (#463)
- Support BigQuery as a target (#437) (#438)
- Make dbt exit codes significant (0 = success, 1/2 = error) (#297)
- Add context function to pull in environment variables (#450)
- Document target configuration for BigQuery here
- Document dbt exit codes here
- Document environment variable usage here
- UI/UX improvements (colorized output, failures summary, better error messages)
- Cancel running queries on ctrl+c
- Bugfixes
- Docs
- Fix bug for interleaved sort keys on Redshift (#430)
- Don't try to create schema if it already exists (#446)
- Summarize failures for dbt invocations (#443)
- Colorized dbt output (#441)
- Cancel running queries on ctrl-c (#444)
- Better error messages for common failure modes (#445)
- Upgrade dependencies (#431)
- Improvements to
dbt initand first time dbt usage experience (#439)
- Document full-refresh requirements for incremental models (#417)
- Document archival (#433)
- Document the two-version variant of
ref(#432)
- Bugfixes
- Reintroduce
compilecommand - Moved docs to readme.io
- Fix bug preventing overriding a disabled package model in the current project (#391)
- Fix bug which prevented multiple sort keys (provided as an array) on Redshift (#397)
- Fix race condition while compiling schema tests in an empty
targetdirectory (#398)
- Bugfixes
- True concurrency
- More control over "advanced" incremental model configurations more info
- Fix ephemeral load order bug (#292, #285)
- Support composite unique key in archivals (#324)
- Fix target paths (#331, #329)
- Ignore commented-out schema tests (#330, #328)
- Fix run levels (#343, #340, #338)
- Fix concurrency, open a unique transaction per model (#345, #336)
- Handle concurrent
DROP ... CASCADEs in Redshift (#349) - Always release connections (use
try .. finally) (#354)
- Changed: different syntax for "relationships" schema tests (#339)
- Added:
already_existscontext function (#372) - Graph refactor: fix common issues with load order (#292)
- Graph refactor: multiple references to an ephemeral models should share a CTE (#316)
- Graph refactor: macros in flat graph (#332)
- Refactor: factor out jinja interactions (#309)
- Speedup: detect cycles at the end of compilation (#307)
- Speedup: write graph file with gpickle instead of yaml (#306)
- Clone dependencies with
--depth 1to make them more compact (#277, #342) - Rewrite materializations as macros (#356)
- Improved graph selection
- A new home for dbt
- Snowflake improvements
- improved graph selection for
dbt runanddbt test(more information) (#279) - profiles.yml now supports Snowflake
roleas an option (#291)
In v0.7.1, dbt was moved from the analyst-collective org to the fishtown-analytics org (#300)
- nicer error if
run-targetwas not changed totargetduring upgrade to dbt>=0.7.0
- Snowflake Support
- Deprecations
dbt now supports Snowflake as a target in addition to Postgres and Redshift! All dbt functionality is supported in this new warehouse. There is a sample snowflake profile in sample.profiles.yml -- you can start using it right away.
There are a few deprecations in 0.7:
run-targetin profiles.yml is no longer supported. Usetargetinstead.- Project names (
namein dbt_project.yml) can now only contain letters, numbers, and underscores, and must start with a letter. Previously they could contain any character. --dry-runis no longer supported.
- use adapter for sort/dist (#274)
- fixed a typo in the docs related to post-run hooks (#271)
- refactored tracking code to refresh invocation id in a multi-run context (#273)
- added unit tests for the graph (#270)
- condense error output when
--debugis not set (#265)
- respect
configoptions in profiles.yml (#255) - use correct
on-run-endoption for post-run hooks (#261)
- add
--debugflag, replace calls toprint()with a global logger (#256) - add pep8 check to continuous integration tests and bring codebase into compliance (#257)
- Macros
- More control over how models are materialized
- Minor improvements
- Bugfixes
- Connor McArthur
Macros are snippets of SQL that can be called like functions in models. Macros make it possible to re-use SQL between models in keeping with the engineering principle of DRY (Dont Repeat Yourself). Moreover, packages can expose Macros that you can use in your own dbt project.
For detailed information on how to use Macros, check out the pull request here
DBT Version 0.6.0 introduces two new ways to control the materialization of models:
Non-destructive dbt run more info
If you provide the --non-destructive argument to dbt run, dbt will minimize the amount of time during which your models are unavailable. Specfically, dbt
will
- Ignore models materialized as
views - Truncate tables and re-insert data instead of dropping and re-creating
This flag is useful for recurring jobs which only need to update table models and incremental models.
dbt run --non-destructiveIncremental Model Full Refresh more info
If you provide the --full-refresh argument to dbt run, dbt will treat incremental models as table models. This is useful when
- An incremental model schema changes and you need to recreate the table accordingly
- You want to reprocess the entirety of the incremental model because of new logic in the model code
dbt run --full-refreshNote that --full-refresh and --non-destructive can be used together!
For more information, run
dbt run --help
Minor improvements more info
Add a {{ target }} variable to the dbt runtime more info
Use {{ target }} to interpolate profile variables into your model definitions. For example:
-- only use the last week of data in development
select * from events
{% if target.name == 'dev' %}
where created_at > getdate() - interval '1 week'
{% endif %}User-specified profiles.yml dir more info
DBT looks for a file called profiles.yml in the ~/.dbt/ directory. You can now overide this directory with
$ dbt run --profiles-dir /path/to/my/dirAdd timestamp to console output more info
Informative and pretty
Run dbt from subdirectory of project root more info
A story in three parts:
cd models/snowplow/sessions
vim sessions.sql
dbt run # it works!Pre and post run hooks more info
# dbt_project.yml
name: ...
version: ...
...
# supply either a string, or a list of strings
on-run-start: "create table public.cool_table (id int)"
on-run-end:
- insert into public.cool_table (id) values (1), (2), (3)
- insert into public.cool_table (id) values (4), (5), (6)We fixed 10 bugs in this release! See the full list here
- added support for custom SQL data tests
- SQL returns 0 results --> pass
- SQL returns > 0 results --> fail
- dbt-core integration tests
- running in Continuous Integration environments
- with code coverage
Schema tests have proven to be an essential part of a modern analytical workflow. These schema tests validate basic constraints about your data. Namely: not null, unique, accepted value, and foreign key relationship properties can be asserted using schema tests.
With dbt v0.5.4, you can now write your own custom "data tests". These data tests are SQL SELECT statements that return 0 rows on success, or > 0 rows on failure. A typical data test might look like:
-- tests/assert_less_than_5_pct_event_cookie_ids_are_null.sql
-- If >= 5% of cookie_ids are null, then the test returns 1 row (failure).
-- If < 5% of cookie_ids are null, then the test returns 0 rows (success)
with calc as (
select
sum(case when cookie_id is null then 1 else 0 end)::float / count(*)::float as fraction
from {{ ref('events') }}
)
select * from calc where fraction < 0.05To enable data tests, add the test-paths config to your dbt_project.yml file:
name: 'Vandelay Industries`
version: '1.0'
source-paths: ["models"]
target-path: "target"
test-paths: ["tests"] # look for *.sql files in the "tests" directory
....Any .sql file found in the test-paths director(y|ies) will be evaluated as data tests. These tests can be run with:
dbt test # run schema + data tests
dbt test --schema # run only schema tests
dbt test --data # run only data tests
dbt test --data --schema # run schema + data tests
# For more information, try
dbt test -hWith the dbt 0.5.4 release, dbt now features a robust integration test suite. These integration tests will help mitigate the risk of software regressions, and in so doing, will help us develop dbt more quickly. You can check out the tests here, and the test results here (linux/osx) and here (windows).
You can check out the DBT roadmap here. In the next few weeks, we'll be working on bugfixes, minor features, improved macro support, and expanded control over runtime materialization configs.
As always, feel free to reach out to us on Slack with any questions or comments!
Bugfix release.
Fixes regressions introduced in 0.5.1 and 0.5.2.
Incremental models were broken by the new column expansion feature. Column expansion is implemented as
alter table ... add column tmp_col varchar({new_size});
update ... set tmp_col = existing_col
alter table ... drop column existing_col
alter table ... rename tmp_col to existing_colThis has the side-effect of moving the existing_col to the "end" of the table. When an incremental model tries to
insert into {table} (
select * from tmp_table
)suddenly the columns in {table} are incongruent with the columns in tmp_table. This insert subsequently fails.
The fix for this issue is twofold:
- If the incremental model table DOES NOT already exist, avoid inserts altogether. Instead, run a
create table as (...)statement - If the incremental model table DOES already exist, query for the columns in the existing table and use those to build the insert statement, eg:
insert into "dbt_dbanin"."sessions" ("session_end_tstamp", "session_start_tstamp", ...)
(
select "session_end_tstamp", "session_start_tstamp", ...
from "sessions__dbt_incremental_tmp"
);In this way, the source and destination columns are guaranteed to be in the same order!
We attempted to refactor the way profiles work in dbt. Previously, a default user profile was loaded, and the profiles specified in dbt_project.yml or on the command line (with --profile) would be applied on top of the user config. This implementation is some of the earliest code that was committed to dbt.
As dbt has grown, we found this implementation to be a little unwieldy and hard to maintain. The 0.5.2 release made it so that only one profile could be loaded at a time. This profile needed to be specified in either dbt_project.yml or on the command line with --profile. A bug was errantly introduced during this change which broke the handling of dependency projects.
The additions of automated testing and a more comprehensive manual testing process will go a long way to ensuring the future stability of dbt. We're going to get started on these tasks soon, and you can follow our progress here: https://github.com/fishtown-analytics/dbt/milestone/16 .
As always, feel free to reach out to us on Slack with any questions or concerns:
Patch release fixing a bug that arises when profiles are overridden on the command line with the --profile flag.
See https://github.com/fishtown-analytics/dbt/releases/tag/v0.5.1
- Raiders of the Lost Archive -- version your raw data to make historical queries more accurate
- Column type resolution for incremental models (no more
Value too long for character typeerrors) - Postgres support
- Top-level configs applied to your project + all dependencies
- --threads CLI option + better multithreaded output
1. Source table archival https://github.com/fishtown-analytics/dbt/pull/183
Commonly, analysts need to "look back in time" at some previous state of data in their mutable tables. Imagine a users table which is synced to your data warehouse from a production database. This users table is a representation of what your users look like now. Consider what happens if you need to look at revenue by city for each of your users trended over time. Specifically, what happens if a user moved from, say, Philadelphia to New York? To do this correctly, you need to archive snapshots of the users table on a recurring basis. With this release, dbt now provides an easy mechanism to store such snapshots.
To use this new feature, declare the tables you want to archive in your dbt_project.yml file:
archive:
- source_schema: synced_production_data # schema to look for tables in (declared below)
target_schema: dbt_archive # where to archive the data to
tables: # list of tables to archive
- source_table: users # table to archive
target_table: users_archived # table to insert archived data into
updated_at: updated_at # used to determine when data has changed
unique_key: id # used to generate archival query
- source_table: some_other_table
target_table: some_other_table_archive
updated_at: "updatedAt"
unique_key: "expressions || work || LOWER(too)"
- source_schema: some_other_schema
....The archived tables will mirror the schema of the source tables they're generated from. In addition, three fields are added to the archive table:
valid_from: The timestamp when this archived row was inserted (and first considered valid)valid_to: The timestamp when this archived row became invalidated. The first archived record for a givenunique_keyhasvalid_to = NULL. When newer data is archived for thatunique_key, thevalid_tofield of the old record is set to thevalid_fromfield of the new record!scd_id: A unique key generated for each archive record. Scd = Slowly Changing Dimension.
dbt models can be built on top of these archived tables. The most recent record for a given unique_key is the one where valid_to is null.
To run this archive process, use the command dbt archive. After testing and confirming that the archival works, you should schedule this process through cron (or similar).
2. Incremental column expansion https://github.com/fishtown-analytics/dbt/issues/175
Incremental tables are a powerful dbt feature, but there was at least one edge case which makes working with them difficult. During the first run of an incremental model, Redshift will infer a type for every column in the table. Subsequent runs can insert new data which does not conform to the expected type. One example is a varchar(16) field which is inserted into a varchar(8) field.
In practice, this error looks like:
Value too long for character type
DETAIL:
-----------------------------------------------
error: Value too long for character type
code: 8001
context: Value too long for type character varying(8)
query: 3743263
location: funcs_string.hpp:392
process: query4_35 [pid=18194]
-----------------------------------------------
With this release, dbt will detect when column types are incongruent and will attempt to reconcile these different types if possible. Specifically, dbt will alter the incremental model table schema from character varying(x) to character varying(y) for some y > x. This should drastically reduce the occurrence of this class of error.
3. First-class Postgres support https://github.com/fishtown-analytics/dbt/pull/183
With this release, Postgres became a first-class dbt target. You can configure a postgres database target in your ~/.dbt/profiles.yml file:
warehouse:
outputs:
dev:
type: postgres # configure a target for Postgres
host: localhost
user: Drew
....
run-target: devWhile Redshift is built on top of Postgres, the two are subtly different. For instance, Redshift supports sort and dist keys, while Postgres does not! dbt will use the database target type parameter to generate the appropriate SQL for the target database.
4. Root-level configs https://github.com/fishtown-analytics/dbt/issues/161
Configurations in dbt_project.yml can now be declared at the models: level. These configurations will apply to the primary project, as well as any dependency projects. This feature is particularly useful for setting pre- or post- hooks that run for every model. In practice, this looks like:
name: 'My DBT Project'
models:
post-hook:
- "grant select on {{this}} to looker_user" # Applied to 'My DBT Project' and 'Snowplow' dependency
'My DBT Project':
enabled: true
'Snowplow':
enabled: true5. --threads CLI option https://github.com/fishtown-analytics/dbt/issues/143
The number of threads that DBT uses can now be overridden with a CLI argument. The number of threads used must be between 1 and 8.
dbt run --threads 1 # fine
# or
dbt run --threads 4 # great
# or
dbt run --threads 42 # too many!In addition to this new CLI argument, the output from multi-threaded dbt runs should be a little more orderly now. Models won't show as STARTed until they're actually queued to run. Previously, the output here was a little confusing. Happy threading!
To upgrade to version 0.5.1 of dbt, run:
pip install --upgrade dbt- Join us on slack with questions or comments
Made with
- use a temp table when executing incremental models
- arbitrary configuration (using config variables)
- specify branches for dependencies
- more & better docs
1. new incremental model generation https://github.com/fishtown-analytics/dbt/issues/138
In previous versions of dbt, an edge case existed which caused the sql_where query to select different rows in the delete and insert steps. As a result, it was possible to construct incremental models which would insert duplicate records into the specified table. With this release, DBT uses a temp table which will 1) circumvent this issue and 2) improve query performance. For more information, check out the GitHub issue: https://github.com/fishtown-analytics/dbt/issues/138
2. Arbitrary configuration https://github.com/fishtown-analytics/dbt/issues/146
Configuration in dbt is incredibly powerful: it is what allows models to change their behavior without changing their code. Previously, all configuration was done using built-in parameters, but that actually limits the user in the power of configuration.
With this release, you can inject variables from dbt_project.yml into your top-level and dependency models. In practice, variables work like this:
# dbt_project.yml
models:
my_project:
vars:
exclude_ip: '192.168.1.1'-- filtered_events.sql
-- source code
select * from public.events where ip_address != '{{ var("exclude_ip") }}'
-- compiles to
select * from public.events where ip_address != '192.168.1.1'The vars parameter in dbt_project.yml is compiled, so you can use jinja templating there as well! The primary use case for this is specifying "input" models to a dependency.
Previously, dependencies used ref(...) to select from a project's base models. That interface was brittle, and the idea that dependency code had unbridled access to all of your top-level models made us a little uneasy. As of this release, we're deprecating the ability for dependencies to ref(...) top-level models. Instead, the recommended way for this to work is with vars! An example:
-- dbt_modules/snowplow/models/events.sql
select * from {{ var('snowplow_events_table') }}and
models:
Snowplow:
vars:
snowplow_events_table: "{{ ref('base_events') }}"This effectively mirrors the previous behavior, but it much more explicit about what's happening under the hood!
3. specify a dependency branch https://github.com/fishtown-analytics/dbt/pull/165
With this release, you can point DBT to a specific branch of a dependency repo. The syntax looks like this:
repositories:
- https://github.com/fishtown-analytics/dbt-audit.git@development # use the "development" branch
Check em out! And let us know if there's anything you think we can improve upon!
To upgrade to version 0.5.0 of dbt, run:
pip install --upgrade dbt--versioncommand- pre- and post- run hooks
- windows support
- event tracking
1. --version https://github.com/fishtown-analytics/dbt/issues/135
The --version command was added to help aid debugging. Further, organizations can use it to ensure that everyone in their org is up-to-date with dbt.
$ dbt --version
installed version: 0.4.7
latest version: 0.4.7
Up to date!2. pre-and-post-hooks https://github.com/fishtown-analytics/dbt/pull/147
With this release, you can now specify pre- and post- hooks that are run before and after a model is run, respectively. Hooks are useful for running grant statements, inserting a log of runs into an audit table, and more! Here's an example of a grant statement implemented using a post-hook:
models:
my_project:
post-hook: "grant select on table {{this}} to looker_user"
my_model:
materialized: view
some_model:
materialized: table
post-hook: "insert into my_audit_table (model_name, run_at) values ({{this.name}}, getdate())"Hooks are recursively appended, so the my_model model will only receive the grant select... hook, whereas the some_model model will receive both the grant select... and insert into... hooks.
Finally, note that the grant statement uses the (hopefully familiar) {{this}} syntax whereas the insert statement uses the {{this.name}} syntax. When DBT creates a model:
- A temp table is created
- The original model is dropped
- The temp table is renamed to the final model name
DBT will intelligently uses the right table/view name when you invoke {{this}}, but you have a couple of more specific options available if you need them:
{{this}} : "schema"."table__dbt_tmp"
{{this.schema}}: "schema"
{{this.table}}: "table__dbt_tmp"
{{this.name}}: "table"
3. Event tracking https://github.com/fishtown-analytics/dbt/issues/89
We want to build the best version of DBT possible, and a crucial part of that is understanding how users work with DBT. To this end, we've added some really simple event tracking to DBT (using Snowplow). We do not track credentials, model contents or model names (we consider these private, and frankly none of our business). This release includes basic event tracking that reports 1) when dbt is invoked 2) when models are run, and 3) basic platform information (OS + python version). The schemas for these events can be seen here
You can opt out of event tracking at any time by adding the following to the top of you ~/.dbt/profiles.yml file:
config:
send_anonymous_usage_stats: False4. Windows support https://github.com/fishtown-analytics/dbt/pull/154
dbt v0.4.1 provides improvements to incremental models, performance improvements, and ssh support for db connections.
- slightly modified dbt command structure
unique_keysetting for incremental models- connect to your db over ssh
- no more model-defaults
- multithreaded schema tests
If you encounter an SSL/cryptography error while upgrading to this version of dbt, check that your version of pip is up-to-date
pip install -U pip
pip install -U dbt1. new dbt command structure https://github.com/fishtown-analytics/dbt/issues/109
# To run models
dbt run # same as before
# to dry-run models
dbt run --dry # previously dbt test
# to run schema tests
dbt test # previously dbt test --validate2. Incremental model improvements https://github.com/fishtown-analytics/dbt/issues/101
Previously, dbt calculated "new" incremental records to insert by querying for rows which matched some sql_where condition defined in the model configuration. This works really well for atomic datasets like a clickstream event log -- once inserted, these records will never change. Other datasets, like a sessions table comprised of many pageviews for many users, can change over time. Consider the following scenario:
User 1 Session 1 Event 1 @ 12:00 User 1 Session 1 Event 2 @ 12:01 -- dbt run -- User 1 Session 1 Event 3 @ 12:02
In this scenario, there are two possible outcomes depending on the sql_where chosen: 1) Event 3 does not get included in the Session 1 record for User 1 (bad), or 2) Session 1 is duplicated in the sessions table (bad). Both of these outcomes are inadequate!
With this release, you can now add a unique_key expression to an incremental model config. Records matching the unique_key will be deleted from the incremental table, then inserted as usual. This makes it possible to maintain data accuracy without recalculating the entire table on every run.
The unique_key can be any expression which uniquely defines the row, eg:
sessions:
materialized: incremental
sql_where: "session_end_tstamp > (select max(session_end_tstamp) from {{this}})"
unique_key: user_id || session_index3. Run schema validations concurrently https://github.com/fishtown-analytics/dbt/issues/100
The threads run-target config now applies to schema validations too. Try it with dbt test
4. Connect to database over ssh https://github.com/fishtown-analytics/dbt/issues/93
Add an ssh-host parameter to a run-target to connect to a database over ssh. The ssh-host parameter should be the name of a Host in your ~/.ssh/config file more info
warehouse:
outputs:
dev:
type: redshift
host: my-redshift.amazonaws.com
port: 5439
user: my-user
pass: my-pass
dbname: my-db
schema: dbt_dbanin
threads: 8
ssh-host: ssh-host-name # <------ Add this line
run-target: devRemove the model-defaults config https://github.com/fishtown-analytics/dbt/issues/111
The model-defaults config doesn't make sense in a dbt world with dependencies. To apply default configs to your package, add the configs immediately under the package definition:
models:
My_Package:
enabled: true
materialized: table
snowplow:
...dbt v0.4.0 provides new ways to materialize models in your database.
- new types of materializations:
incrementalandephemeral - if upgrading, change
materialized: true|falsetomaterialized: table|view|incremental|ephemeral - optionally specify model configs within the SQL file
1. Feature: {{this}} template variable https://github.com/fishtown-analytics/dbt/issues/81
The {{this}} template variable expands to the name of the model being compiled. For example:
-- my_model.sql
select 'the fully qualified name of this model is {{ this }}'
-- compiles to
select 'the fully qualified name of this model is "the_schema"."my_model"'2. Feature: materialized: incremental https://github.com/fishtown-analytics/dbt/pull/90
After initially creating a table, incremental models will insert new records into the table on subsequent runs. This drastically speeds up execution time for large, append-only datasets.
Each execution of dbt run will:
- create the model table if it doesn't exist
- insert new records into the table
New records are identified by a sql_where model configuration option. In practice, this looks like:
sessions:
materialized: incremental
sql_where: "session_start_time > (select max(session_start_time) from {{this}})"There are a couple of new things here. Previously, materialized could either be set to true or false. Now, the valid options include view, table, incremental, and ephemeral (more on this last one below). Also note that incremental models generally require use of the {{this}} template variable to identify new records.
The sql_where field is supplied as a where condition on a subquery containing the model definition. This resultset is then inserted into the target model. This looks something like:
insert into schema.model (
select * from (
-- compiled model definition
) where {{sql_where}}
)3. Feature: materialized: ephemeral https://github.com/fishtown-analytics/dbt/issues/78
Ephemeral models are injected as CTEs (with statements) into any model that references them. Ephemeral models are part of the dependency graph and generally function like any other model, except ephemeral models are not compiled to their own files or directly created in the database. This is useful for intermediary models which are shared by other downstream models, but shouldn't be queried directly from outside of dbt.
To make a model ephemeral:
employees:
materialized: ephemeralSuppose you wanted to exclude employees from your users table, but you don't want to clutter your analytics schema with an employees table.
-- employees.sql
select * from public.employees where is_deleted = false
-- users.sql
select *
from {{ref('users')}}
where email not in (select email from {{ref('employees')}})The compiled SQL would look something like:
with __dbt__CTE__employees as (
select * from public.employees where is_deleted = false
)
select *
from users
where email not in (select email from __dbt__CTE__employees)Ephemeral models play nice with other ephemeral models, incremental models, and regular table/view models. Feel free to mix and match different materialization options to optimize for performance and simplicity.
4. Feature: In-model configs https://github.com/fishtown-analytics/dbt/issues/88
Configurations can now be specified directly inside of models. These in-model configs work exactly the same as configs inside of the dbt_project.yml file.
An in-model-config looks like this:
-- users.sql
-- python function syntax
{{ config(materialized="incremental", sql_where="id > (select max(id) from {{this}})") }}
-- OR json syntax
{{
config({"materialized:" "incremental", "sql_where" : "id > (select max(id) from {{this}})"})
}}
select * from public.usersThe config resolution order is:
- dbt_project.yml
model-defaults - in-model config
- dbt_project.yml
modelsconfig
5. Fix: dbt seed null values https://github.com/fishtown-analytics/dbt/issues/102
Previously, dbt seed would insert empty CSV cells as "None", whereas they should have been NULL. Not anymore!
Version 0.3.0 comes with the following updates:
1. Parallel model creation https://github.com/fishtown-analytics/dbt/pull/83
dbt will analyze the model dependency graph and can create models in parallel if possible. In practice, this can significantly speed up the amount of time it takes to complete dbt run. The number of threads dbt uses must be between 1 and 8. To configure the number of threads dbt uses, add the threads key to your dbt target in ~/.dbt/profiles.yml, eg:
user:
outputs:
my-redshift:
type: redshift
threads: 4 # execute up to 4 models concurrently
host: localhost
...
run-target: my-redshiftFor a complete example, check out a sample profiles.yml file
2. Fail only within a single dependency chain https://github.com/fishtown-analytics/dbt/issues/63
If a model cannot be created, it won't crash the entire dbt run process. The errant model will fail and all of its descendants will be "skipped". Other models which do not depend on the failing model (or its descendants) will still be created.
3. Logging https://github.com/fishtown-analytics/dbt/issues/64, https://github.com/fishtown-analytics/dbt/issues/65
dbt will log output from the dbt run and dbt test commands to a configurable logging directory. By default, this directory is called logs/. The log filename is dbt.log and it is rotated on a daily basic. Logs are kept for 7 days.
To change the name of the logging directory, add the following line to your dbt_project.yml file:
log-path: "my-logging-directory" # will write logs to my-logging-directory/dbt.log4. Minimize time models are unavailable in the database https://github.com/fishtown-analytics/dbt/issues/68
Previously, dbt would create models by:
- dropping the existing model
- creating the new model
This resulted in a significant amount of time in which the model was inaccessible to the outside world. Now, dbt creates models by:
- creating a temporary model
{model-name}__dbt_tmp - dropping the existing model
- renaming the tmp model name to the actual model name
5. Arbitrarily deep nesting https://github.com/fishtown-analytics/dbt/issues/50
Previously, all models had to be located in a directory matching models/{model group}/{model_name}.sql. Now, these models can be nested arbitrarily deeply within a given dbt project. For instance, models/snowplow/sessions/transformed/transformed_sessions.sql is a totally valid model location with this release.
To configure these deeply-nested models, just nest the config options within the dbt_project.yml file. The only caveat is that you need to specify the dbt project name as the first key under the models object, ie:
models:
'Your Project Name':
snowplow:
sessions:
transformed:
transformed_sessions:
enabled: trueMore information is available on the issue and in the sample dbt_project.yml file
6. don't try to create a schema if it already exists https://github.com/fishtown-analytics/dbt/issues/66
dbt run would execute create schema if not exists {schema}. This would fail if the dbt user didn't have sufficient permissions to create the schema, even if the schema already existed! Now, dbt checks for the schema existence and only attempts to create the schema if it doesn't already exist.
The previous release of dbt was v0.2.3.0 which isn't a semantic version. This and all future dbt releases will conform to semantic version in the format {major}.{minor}.{patch}.
Version 0.2.3.0 of dbt comes with the following updates:
Referential integrity validations in a schema.yml file were previously defined relative to the parent table:
account:
constraints:
relationships:
- {from: id, to: people, field: account_id}Now, these validations are specified relative to the child table
people:
constraints:
relationships:
- {from: account_id, to: accounts, field: id}For more information, run dbt test -h
Previously, auxiliary data needed to be shoehorned into a view comprised of union statements, eg.
select 22 as "type", 'Chat Transcript' as type_name, 'chatted via olark' as event_name union all
select 21, 'Custom Redirect', 'clicked a custom redirect' union all
select 6, 'Email', 'email sent' union all
...That's not a scalable solution. Now you can load CSV files into your data warehouse:
- Add a CSV file (with a header) to the
data/directory - Run
dbt seedto create a table from the CSV file! - The table name with be the filename (sans
.csv) and it will be placed in yourrun-target's schema
Subsequent calls to dbt seed will truncate the seeded tables (if they exist) and re-insert the data. If the table schema changes, you can run dbt seed --drop-existing to drop the table and recreate it.
For more information, run dbt seed -h
Versioning your SQL models with dbt is a great practice, but did you know that you can also version your analyses? Any SQL files in the analysis/ dir will be compiled (ie. table names will be interpolated) and placed in the target/build-analysis/ directory. These analytical queries will not be run against your data warehouse with dbt run -- you should copy/paste them into the data analysis tool of your choice.
In your schema.yml file, you can now add accepted-values validations:
accounts:
constraints:
accepted-values:
- {field: type, values: ['paid', 'free']}This test will determine how many records in the accounts model have a type other than paid or free.
Switch between profiles with --profile [profile-name] and switch between run-targets with --target [target-name].
Targets should be something like "prod" or "dev" and profiles should be something like "my-org" or "my-side-project"
side-project:
outputs:
prod:
type: redshift
host: localhost
port: 5439
user: Drew
pass:
dbname: data_generator
schema: ac_drew
dev:
type: redshift
host: localhost
port: 5439
user: Drew
pass:
dbname: data_generator
schema: ac_drew_dev
run-target: devTo compile models using the dev environment of my side-project profile:
$ dbt compile --profile side-project --target dev
or for prod:
$ dbt compile --profile side-project --target prod
You can also add a "profile' config to the dbt_config.yml file to fix a dbt project to a specific profile:
...
test-paths: ["test"]
data-paths: ["data"]
# Fix this project to the "side-project" profile
# You can still use --target to switch between environments!
profile: "side-project"
model-defaults:
....