DataHub v0.14.1 Release Notes
User Experience
-
Enhanced Data Propagation UI: New features allow viewing propagated column documentation, source information, and asset-level propagation details. This improves visibility into data lineage and enables better understanding of data flow across the organization. (#11047)
-
Improved Search Result Tracking: Added page number to search result click events, enabling better measurement of search ranking performance. This helps users understand and optimize their search experience. (#11151)
-
Fixed Display Issues: Resolved issues with displaying "0" values for last ingested data and improved handling of multilingual characters in descriptions. These fixes ensure more accurate and readable information presentation. (#10840, #10975)
Developer Experience
-
Performance Improvements:
-
Enhanced Search Capabilities:
- Added support for custom highlighting fields in GraphQL queries, allowing faster and more customizable data retrieval. (#11339)
- Implemented new search query functionality to filter by parents/children of Domains or Containers. (#11279)
- Added support for multiple values in 'CONTAIN', 'START_WITH', and 'END_WITH' operators, enabling more flexible and precise searches. (#11068)
-
API Improvements:
-
Bug Fixes:
- Resolved issues with forward slash handling in search queries, empty key-value pairs in Elasticsearch mapping, and support for various data types in object fields. These fixes improve search accuracy and data representation. (#10932, #11004, #11066)
- Addressed Postgres regression by upgrading the ebean library from version 12.x to 15.x, resolving a read lock NPE issue. (#11379)
Metadata Ingestion
-
S3 Integration Enhancements:
-
BigQuery Improvements:
- Implemented query log extractor for BigQuery, creating "Query" entities with usage statistics, lineage, and operation details. (#10994)
- Added support for filtering GCP project ingestion based on project labels, enabling more targeted data collection. (#11169)
- Implemented query job retries for transient errors, improving system robustness. (#11162)
-
Snowflake Updates:
-
New and Updated Connectors:
- Added ingestion source for SAP Analytics Cloud, expanding DataHub's integration capabilities. (#10958)
- Enhanced Salesforce connector with customizable API version and improved error messages. (#11145, #11266)
- Updated Tableau ingestion process with new parameters and improved field type parsing. (#11255, #11202)
-
Other Ingestion Improvements:
- Added support for MongoDB database ingestion as containers. (#11178)
- Implemented automatic capturing of Snowflake assets with Pandas I/O Manager in Dagster module. (#11189)
- Enhanced Fivetran ingestion with destination ID filtering capabilities. (#11277)
- Added support for browse-only tables in Databricks ingestion. (#10766)
Other Improvements and Fixes
- Upgraded various dependencies including Kafka, Azure Identity, Acryl-SQLglot, and GraphQL/Spring versions.
- Improved error handling and logging across multiple components.
- Enhanced test coverage and reliability.
- Updated documentation for various features and processes.
Breaking Changes
Notable breaking changes include:
- Removal of
lower
method fromget_db_name
inSQLAlchemySource
, affecting URNs of related entities. - Changes to default sink mode and aspect handling that require server version 0.14.0+.
See the full details here.
Contributors
We extend our heartfelt thanks to all contributors for their valuable work on this release:
First-Time Contributors
@AaronYang0628, @alexandrebunn, @alisa-aylward-toast, @arpanchakra29, @esselius, @eunseokyang, @ignitz, @milindgupta, @milindgupta9, @Nbagga14, @rohansun, @sakethvarma397, @vignesh-hbk
Repeat Contributors
@deepgarg-visa, @dushayntAW, @feldjay, @filipe-caetano-ovo, @ksrinath, @Masterchen09, @matthew-coudert-cko, @mayurinehate, @nmbryant, @pinakipb2, @prashanthic23, @sagar-salvi-apptware, @siladitya2, @sleeperdeep
DataHub Maintainers
@anshbansal, @asikowitz, @chriscollins3456, @darnaut, @david-leifker, @eboneil, @hsheth2, @jjoyce0510, @maggiehays, @pedro93, @RyanHolstien, @shirshanka, @sid-acryl, @skrydal, @treff7es, @yoonhyejin
Your contributions are invaluable in making DataHub better for everyone. Thank you!
What's Changed
- test(smoke-test): updates to smoke-tests by @david-leifker in #11152
- feat(dbt): support prefer_sql_parser_lineage with sources enabled by @hsheth2 in #11168
- feat(actions): updates to gha workflows by @david-leifker in #11150
- build: fix docker warnings by @anshbansal in #11163
- feat(hooks): Make hook enable flag non-default by @pedro93 in #11159
- fix(ci): smoke-test changes do not need to build images by @david-leifker in #11174
- fix(ci): fix single tag comma split by @david-leifker in #11179
- lint(restore-indices): clean-up restore indices class by @david-leifker in #11176
- fix(ci): typo by @david-leifker in #11180
- fix(ci): additional ci and smoke-test updates by @david-leifker in #11183
- test(smoke-test): minor update to openapi test by @david-leifker in #11184
- feat(ingest): use pre-built dockerize binary by @hsheth2 in #11181
- doc: mark deprecated feature by @anshbansal in #11175
- fix(delete) Fix removing completed/verified forms references by @chriscollins3456 in #11172
- feat(docs): update docs for new release by @RyanHolstien in #11164
- fix(ingest): invalid urn should not fail full batch of changes by @RyanHolstien in #11187
- fix(kafka-setup): add missing script to image by @david-leifker in #11190
- fix(config): fix hash algo config by @david-leifker in #11191
- feat(ingest): allow custom SF API version by @skrydal in #11145
- fix(ingestion/transformer): extend dataset_to_data_product_urns_pattern to support containers by @sagar-salvi-apptware in #11124
- fix(ui) Fix bug with editing entity names by @chriscollins3456 in #11186
- ci(smoke-test): allow smoke-test only PRs by @david-leifker in #11194
- feat(ingestion/lookml): support looker
-- if
comments by @sid-acryl in #11113 - fix(elasticsearch): refactor idHashAlgo setting by @david-leifker in #11193
- fix(ingestion/airflow-plugin): fixed missing inlet/outlets by @dushayntAW in #11101
- docs(readme): add security notes by @david-leifker in #11196
- docs: Update README.md by @prashanthic23 in #11144
- feat(ingest/dbt): skip CLL on sources with
skip_sources_in_lineage
by @hsheth2 in #11195 - fix(graphql): Correct ownership check when removing owners by @pedro93 in #11154
- feat(propagation): UI for rendering propagated column documentation by @jjoyce0510 in #11047
- fix(ui): checks truthy value for last ingested by @pinakipb2 in #10840
- docs(scim): document okta integration with datahub for scim provisioning by @ksrinath in #11120
- fix(ingestion/tableau): Tableau field type parsing by @skrydal in #11202
- feat(analytics): Add page number to SearchResultClickEvent analytics event by @filipe-caetano-ovo in #11151
- fix(graphql) Fix NPE on form actor assignemnt by @chriscollins3456 in #11203
- fix(tests): Bump databricks-sdk dependency to
>=0.30.0
by @skrydal in #11209 - chore(vulnerability): Log Injection (High) by @pinakipb2 in #11131
- feat(ingestion/bigquery): Add ability to filter GCP project ingestion based on project labels by @sid-acryl in #11169
- chore(kafka): kafka version bump by @david-leifker in #11211
- fix(forms) Fix small bug in createForm graphql endpoint by @chriscollins3456 in #11216
- fix(ingestion/lookml): drop
hive.
from CLL by @sid-acryl in #11210 - feat: separate great-expectations action package by @mayurinehate in #11096
- fix(ingest/lookml): support view inheritance for fields by @sid-acryl in #11148
- feat(ingest/mongodb): Ingest databases as containers by @asikowitz in #11178
- fix(ingest/redshift): avoid asserts in redshift schemas by @hsheth2 in #11219
- feat(ingest/snowflake): allow iceberg tables in lineage and access metadata by @alisa-aylward-toast in #10961
- feat(ingestion/looker): filter Looker dashboards by folder by @sid-acryl in #11205
- fix(ingest/sagemaker): ensure consistent STS token usage with refresh mechanism by @sagar-salvi-apptware in #11170
- feat(ingest/s3): Partition support by @treff7es in #11083
- fix: modify the archived version & update code to download only the a… by @yoonhyejin in #11228
- chore(bump): bump hadoop and dnsjava versions by @david-leifker in #11227
- chore(bump): update graphql & spring version by @david-leifker in #11226
- docs(ingest): update config docs on platform instances by @hsheth2 in #11206
- feat(ingest/dbt): add support for urns in add_owner directive by @hsheth2 in #11221
- fix(ingest/snowflake): propagate table list from main to query extractor by @hsheth2 in #11222
- chore(bump): bump kafka base image by @david-leifker in #11236
- fix(datahub-frontend): remove old test creds by @david-leifker in #11237
- docs: Update confluent-cloud.md by @alexandrebunn in #11212
- docs(update): Security stance docs.md by @david-leifker in #11241
- feat(ingest): add bigquery-queries source by @mayurinehate in #10994
- fix(spark-lineage): enable user with editor role to ingest dataProces… by @deepgarg-visa in #11130
- fix(analytics): index description so analytics are correct by @anshbansal in #11224
- chore(bump): bump azure-identity by @david-leifker in #11235
- feat(docs): Update docs on managing user subscriptions by @pedro93 in #11243
- fix(cli/delete): change filter to include env by @anshbansal in #11246
- Rephrase scope of automated scanning by @darnaut in #11248
- docs(urn): Update urn docs with restrictions by @eboneil in #11213
- feat(ingest): add ingestion source for SAP Analytics Cloud by @Masterchen09 in #10958
- feat(ingest/superset): clickhousedb -> clickhouse mapping in superset ingestion by @esselius in #11201
- feat(ingest/bigquery): Add query job retries for transient errors by @feldjay in #11162
- Replacing ant dropdown 'overlay' with 'menu' by @sakethvarma397 in #11229
- fix(spark-lineage): exclude log4j.xml and log4j2.xml from openlineage… by @deepgarg-visa in #11239
- fix(ingest/snowflake): exclude snowflake excluded tags by @alisa-aylward-toast in #11250
- fix(ingest/kafka): update warning reporting for kafka by @hsheth2 in #11171
- chore(vulnerability): Incomplete string escaping or encoding by @pinakipb2 in #11060
- docs: fix great-expectations doc module_name by @mayurinehate in #11253
- feat(ingest/dagster): Add automatic snowflake_pandas_io_manager asset capture by @treff7es in #11189
- chore: update contributor list by @sakethvarma397 in #11257
- fix: Refactoring the antd Modal
visible
property toopen
by @sakethvarma397 in #11232 - ci(build): update outdated action & pin deepdiff lib by @anshbansal in #11260
- feat(ingestion-base): convert to ubuntu image by @david-leifker in #11263
- ci: update outdated actions for java and python setup by @anshbansal in #11261
- chore(platform): Adding Dagster and Prefect platforms by @treff7es in #11264
- fix(ingestion/prefect-plugin): fixed the unit tests by @dushayntAW in #10643
- fix(build/spark): Add explicit dependency to openlineage-converter by @treff7es in #11268
- ci(flavor): reintroduce flavor suffix by @david-leifker in #11265
- feat(ingest/snowflake): Add cluster formula to dataset properties by @alisa-aylward-toast in #11254
- fix(ingestion-base): add missing util by @david-leifker in #11269
- feat(build): remove base-requirements.txt by @hsheth2 in #11238
- build(deps): bump webpack from 5.91.0 to 5.94.0 in /docs-website by @dependabot in #11258
- build(deps): bump micromatch from 4.0.5 to 4.0.8 in /docs-website by @dependabot in #11242
- feat(ingest/s3): Support reading S3 file type by @asikowitz in #11177
- fix(openlineage): fix jar conflict by @david-leifker in #11278
- fix(ingest): limit number of upstreams generated by sql parsing aggre… by @mayurinehate in #11267
- feat(ingest/fivetran): support filtering on destination ids by @matthew-coudert-cko in #11277
- feat(ingest/bq): integrate bigquery-queries into main source by @mayurinehate in #11247
- doc(acryl cloud): release notes for 0.3.5.x by @anshbansal in #11259
- feat(ingest/databricks): include metadata for browse only tables by @mayurinehate in #10766
- fix(docs): fix logout url by @david-leifker in #11294
- feat(ingest): add python deps for
apk
by @hsheth2 in #11188 - fix(ingest/mssql): remove lower() method from sql_common get_db_name by @sleeperdeep in #10773
- feat(graphql): Lazy dataLoaders by @david-leifker in #11293
- fix(bigquery): followups on bigquery queries v2 integration by @mayurinehate in #11291
- fix(ingest): add custom StrEnum type by @hsheth2 in #11270
- feat(schemaField): populate schemaFields with side effects by @david-leifker in #10928
- fix(ingest/prefect): Temporary pinning Prefect 2.x until we can upgrade to 3.x by @treff7es in #11302
- feat(ingest/athena): Add option to disable partition extraction by @treff7es in #11286
- docs(adoption): Add Inter&Co by @ignitz in #11299
- fix(api/timeline): fix corner cases missed, add tests by @anshbansal in #11288
- config(kafka): clean-up kafka serializer config by @david-leifker in #11303
- fix(ingest/protobuf): Improve String Handling for Multilingual Support in Descriptions by @eunseokyang in #10975
- feat(ingest): Support protobuf description for enum field by @eunseokyang in #11027
- fix(search): Search not returning result if query text contains forward slash by @siladitya2 in #10932
- feat(ingest/salesforce): helpful error messages on failure by @mayurinehate in #11266
- fix(search): fix regression from #10932 by @david-leifker in #11309
- chore(vulnerability): Insecure randomness by @pinakipb2 in #11058
- feat(ingest/sql): add default dialect support to SqlQueriesSource by @rohansun in #11285
- fix : added support for multiple values for CONTAIN, START_WITH and END_WITH operators by @Nbagga14 in #11068
- feat(ingest): enable query usage stats by default by @hsheth2 in #11281
- build(deps): bump micromatch from 4.0.5 to 4.0.8 in /datahub-web-react by @dependabot in #11296
- fix(docs): Add correct link for automations by @jjoyce0510 in #11323
- feat(cli): reject missing urns in
datahub get
by @hsheth2 in #11313 - fix(smoke): fix timeseries delete test's usage of
datahub get
by @hsheth2 in #11330 - feat(ingest): make rest emitter version error messages more clear by @hsheth2 in #11295
- docs(ingest/dbt): clarify dbt ingestion docs by @hsheth2 in #11312
- fix(py): fix issues with AvroException by @hsheth2 in #11311
- fix(ingestion/tableau): restructure the tableau graphql datasource query by @sid-acryl in #11230
- fix(ingest): disable reporting for dry-run pipelines by @hsheth2 in #11306
- feat(ingest): support full urns without owner_type in meta mapping by @hsheth2 in #11298
- feat(ingest/sql): auto extract and use mode query user metadata by @mayurinehate in #11307
- fix(version): forUpdate needed for versioning by @david-leifker in #11328
- fix(ingest): avoid sqlite "too many SQL variables" error by @hsheth2 in #11332
- chore(ingest): bump acryl-sqlglot by @hsheth2 in #11331
- docs(oidc): document azure logout uri by @david-leifker in #11344
- feat(logging): add option to log slow GraphQL queries by @nmbryant in #11308
- docs(ingest/dbt): add docs on hiding sources by @hsheth2 in #11334
- feat(mode/ingest): Add support for missing Mode datasets in lineage by @sagar-salvi-apptware in #11290
- feat(entity-service): fallback logic for aspect version by @david-leifker in #11304
- fix(ingest/bq): fix ordering of queries for use_queries_v2 by @mayurinehate in #11333
- docs(updating-datahub) Bump minor version on v0.14.0 notes by @maggiehays in #11255
- docs(data product): Update example and docs by @eboneil in #11032
- feat(ingest): maintain ordering in file-backed dict by @hsheth2 in #11346
- docs: add signup form in cloud by @yoonhyejin in #11129
- config(retention): update dataHubExecutionRequestResult by @david-leifker in #11348
- feat(grafana): Using v2 metrics update datahub dashboard by @AaronYang0628 in #11208
- chore(links): add attribution by @shirshanka in #11352
- fix(timeline api): adding modification category by @sakethvarma397 in #11345
- Feature/custom highlight on search by @arpanchakra29 in #11339
- fix(gms): filter out runs of a dataJob without any run-events by @ksrinath in #11223
- fix(ingest): followup on bigquery queries v2 ordering by @mayurinehate in #11353
- fix(ingest/databricks): use latest report message format for warning messages by @sid-acryl in #11319
- chore(ingest): improve code formatting by @hsheth2 in #11326
- chore(py): cleanup python CI by @hsheth2 in #11324
- feat(auth): implement session authorization cache by @david-leifker in #11327
- feat(search): search query rewriter by @david-leifker in #11279
- feat(openapi-v3): add additional delete options by @david-leifker in #11347
- perf(search): reduce highlight fragments by @david-leifker in #11349
- feat(throttle): extend throttling to API requests by @david-leifker in #11325
- fix(browse): adjust browse to use full text in line with search by @RyanHolstien in #11367
- Fix: bug fix for empty key values pair in elastic search mapping by @milindgupta9 in #11004
- feat(ingest): make default rest sink mode env-configurable by @hsheth2 in #11335
- feat: add acryl stories by @yoonhyejin in #11351
- fix: add cloud form & fix css by @yoonhyejin in #11362
- fix(NPE): fix NPE in EntityService by @david-leifker in #11373
- feat(ingest/dbt): add
only_include_if_in_catalog
flag for dbt core by @hsheth2 in #11314 - chore(actions): bump actions version in docker profiles by @david-leifker in #11377
- fix(ingest/nifi): add error handling for version by @anshbansal in #11385
- fix(XServiceProvider): fix ebean framework race condition by @david-leifker in #11378
- fix(docs): clarify clean-up of indices when restoring search and graph indices by @Masterchen09 in #11380
- feat(ingest): report ingest run for sample data by @hsheth2 in #11329
- fix(ebean): upgrade ebean library by @david-leifker in #11379
- fix(ingest/snowflake): Update snowflake_utils.py to account for iceberg tables by @alisa-aylward-toast in #11384
- feat(ingest): default to ASYNC_BATCH mode in datahub-rest sink by @hsheth2 in #11369
- feat(graphql): Support START_WITH and END_WITH operator in GraphQL API by @milindgupta in #11026
- fix: support for non-string types in object fields by @vignesh-hbk in #11066
- refactor(search): refactor field type detection by @david-leifker in #11395
New Contributors
- @prashanthic23 made their first contribution in #11144
- @alisa-aylward-toast made their first contribution in #10961
- @alexandrebunn made their first contribution in #11212
- @esselius made their first contribution in #11201
- @sakethvarma397 made their first contribution in #11229
- @ignitz made their first contribution in #11299
- @eunseokyang made their first contribution in #10975
- @rohansun made their first contribution in #11285
- @Nbagga14 made their first contribution in #11068
- @AaronYang0628 made their first contribution in #11208
- @arpanchakra29 made their first contribution in #11339
- @milindgupta9 made their first contribution in #11004
- @milindgupta made their first contribution in #11026
- @vignesh-hbk made their first contribution in #11066
Full Changelog: v0.14.0.2...v0.14.1