-
Notifications
You must be signed in to change notification settings - Fork 38
added the remorph automation docs and notebook #1587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ 15/15 passed, 1 skipped, 16s total Running from acceptance #720 |
|
||
## Notebook Details | ||
|
||
[Link to the notebook](remorph_reconciliation.dbc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the comments
|
||
## Overview | ||
|
||
The purpose of this utility is to automate table reconciliation based on provided table configurations. It ensures a streamlined comparison of tables, applying necessary transformations and computing reconciliation results efficiently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please format the paragraph: Something like below to be more readable.:
- The purpose of this utility is to automate table reconciliation based on provided table configurations.
- It ensures a streamlined comparison of tables, applying necessary transformations and computing reconciliation results efficiently.
- The utility also provides lookup tables, which can be configured to provide:
- inputs on the source/target tables
- transformations to be applied,
- thresholds to be set, etc..
|
||
## Pre-requisites | ||
|
||
- The Remorph tool should be configured through CLI to create the remorph catalog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Catalog name could be anything; the default is remorph
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remorph catalog can be edited as a variable/parameter i.e remorph_catalog
.
databricks_catalog STRING, | ||
databricks_schema STRING, | ||
databricks_table STRING, | ||
pk ARRAY<STRING>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please specify full name:
primary_key
- Ensure `table_recon_summary` table is created inside `<remorph_catalog>.<remorph_schema>` with the below DDL. This table will store the summary results of the validated tables. | ||
```sql | ||
CREATE TABLE <remorph_catalog>.<remorph_schema>.table_recon_summary ( | ||
timestamp TIMESTAMP, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As timestamp is a reserved keyword, can we rename it to a different name, like run_timestamp
?
Can the DBC/py files be included in the package, or are we planning on shipping this as part of the documentation? cc: @sundarshankar89 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we review if this functionality can directly be made available to the package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be the final goal, for now documentation is least friction and then we can move this into the core.
Changes
What does this PR do?
Relevant implementation details
Caveats/things to watch out for when reviewing:
Linked issues
Resolves #..
Functionality
databricks labs remorph ...
Tests