-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import validation updates #1227
base: master
Are you sure you want to change the base?
Conversation
vish-cs
commented
Feb 4, 2025
- Invoke dc import tool to generate resolved mcf
- Upload validation output files to GCS
- Validation status check for upload
- Default executor type: CLOUD_RUN
63babc5
to
222cd59
Compare
output_file = open(self.validation_output, mode='w', encoding='utf-8') | ||
output_file.write('test,status,message\n') | ||
status = True | ||
for config in self.validation_config: | ||
result = self._run_validation(config) | ||
output_file.write( | ||
f'{result.name},{result.status},{result.message}\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is a csv row. If yes, comma in result.message might interfere with parsing later. Perhaps consider usign a csv lib. An example code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are yet to work on refining the messages. They are currently dummy messages. I have added a TODO to switch to a CSV lib as a follow up when we do that.
config_file) | ||
import_prefix, 'validation') | ||
current_data_path = os.path.join(validation_output_path, '*.mcf') | ||
previous_data_path = latest_version + f'/{import_prefix}/validation/*.mcf' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous data path might not contain mcfs (for already onbaorded auto-imports). Should we skip if mcfs are missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are uploading mcfs as part of the import output upload step now. In that case, why shall we not find mcfs in GCS?
- Invoke dc import tool to generate resolved mcf - Upload validation output files to GCS - Validation status check for upload - Default executor type: CLOUD_RUN
) | ||
return | ||
# blob.download_to_filename(previous_data_path) | ||
'Skipping validation due to missing latest version.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the summary_report.csv be validated without the differ for the first run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will require some logic to only run certain validations (and skip the differ ones). We can do that as a follow up as needed.
self.config.import_metadata_mcf_filename)) | ||
else: | ||
logging.error( | ||
"Skipping latest version update due to validation failure.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be better to upload output to GCS even if validation fails to help with debugging.
We can skip updating latest_version.txt if validation fails so the next diff is with the previous valid output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the case actually. We are uploading validation output files as the last step in _invoke_import_validation but not updating latest_version in case of failure.