Skip to content

added the remorph automation docs and notebook #1587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gokulrenga-db
Copy link

Changes

What does this PR do?

Relevant implementation details

Caveats/things to watch out for when reviewing:

Linked issues

Resolves #..

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs remorph ...
  • ... +add your own

Tests

  • manually tested
  • added unit tests
  • added integration tests

Copy link

✅ 15/15 passed, 1 skipped, 16s total

Running from acceptance #720


## Notebook Details

[Link to the notebook](remorph_reconciliation.dbc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to static

Copy link
Contributor

@vijaypavann-db vijaypavann-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the comments


## Overview

The purpose of this utility is to automate table reconciliation based on provided table configurations. It ensures a streamlined comparison of tables, applying necessary transformations and computing reconciliation results efficiently.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format the paragraph: Something like below to be more readable.:

  • The purpose of this utility is to automate table reconciliation based on provided table configurations.
  • It ensures a streamlined comparison of tables, applying necessary transformations and computing reconciliation results efficiently.
  • The utility also provides lookup tables, which can be configured to provide:
    • inputs on the source/target tables
    • transformations to be applied,
    • thresholds to be set, etc..


## Pre-requisites

- The Remorph tool should be configured through CLI to create the remorph catalog
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Catalog name could be anything; the default is remorph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remorph catalog can be edited as a variable/parameter i.e remorph_catalog.

databricks_catalog STRING,
databricks_schema STRING,
databricks_table STRING,
pk ARRAY<STRING>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please specify full name:
primary_key

- Ensure `table_recon_summary` table is created inside `<remorph_catalog>.<remorph_schema>` with the below DDL. This table will store the summary results of the validated tables.
```sql
CREATE TABLE <remorph_catalog>.<remorph_schema>.table_recon_summary (
timestamp TIMESTAMP,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As timestamp is a reserved keyword, can we rename it to a different name, like run_timestamp ?

@biswadeepupadhyay-db
Copy link
Contributor

Can the DBC/py files be included in the package, or are we planning on shipping this as part of the documentation?

cc: @sundarshankar89

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we review if this functionality can directly be made available to the package?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the final goal, for now documentation is least friction and then we can move this into the core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants