Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate the data platform metadata catalog #1355

Open
12 tasks
blarghmatey opened this issue Oct 29, 2024 · 0 comments
Open
12 tasks

Populate the data platform metadata catalog #1355

blarghmatey opened this issue Oct 29, 2024 · 0 comments
Assignees
Labels
Data Engineering product:data-platform Issues related to the Data Platform product

Comments

@blarghmatey
Copy link
Member

blarghmatey commented Oct 29, 2024

User Story

  • As a data platform engineer I want to have all of the system metadata collected to improve data discovery and power data governance

Description/Context

Now that we have OpenMetadata deployed we need to populate it with metadata from all of the platform components. The data ingestion is managed with the OpenMetadata ingestion library (https://docs.open-metadata.org/latest/deployment/ingestion/external). The majority of the data sources can be managed with the connection workflows (https://docs.open-metadata.org/latest/connectors). Clicking a connector and selecting the "Run The Connector Externally" link will display the YAML configuration details.

Acceptance Criteria

Metadata from the following systems is ingested and regularly updated in our deployment of OpenMetadata

  • Trino (Starburst Galaxy)
  • dbt
  • Dagster
  • Redash
  • Superset
  • S3
  • Iceberg
  • Airbyte

Lineage information is from the following systems is ingested and maintained in OpenMetadata

  • Trino (Starburst Galaxy)
  • dbt

Profiling and quality information is collected from the following sources

  • Trino
  • Iceberg

Plan/Design

For the majority of sources we should be able to use the MetadataWorkflow object for managing ingestion from the out-of-the-box sources (https://docs.open-metadata.org/latest/deployment/ingestion/external). More detailed or custom metadata ingestion will be managed as custom Dagster assets. All of the execution will be managed via Dagster pipelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Engineering product:data-platform Issues related to the Data Platform product
Projects
None yet
Development

No branches or pull requests

3 participants