-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Higher order syntactic and semantic profiler transforms #653
Higher order syntactic and semantic profiler transforms #653
Conversation
Signed-off-by: aishwariyachakraborty <[email protected]>
Signed-off-by: aishwariyachakraborty <[email protected]>
eab62a5
to
230c500
Compare
Signed-off-by: aishwariyachakraborty <[email protected]>
Signed-off-by: aishwariyachakraborty <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please merge your branch with dev and regenerate this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made all the changes you have requested. I do not have an option to merge yet. It shows : Merging is blocked
Merging can be performed automatically once the requested changes are addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The merge I'm referring to is one you do locally into your fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please merge your branch with dev and regenerate this file.
transforms/code/higher_order_syntactic_profiler/python/pyproject.toml
Outdated
Show resolved
Hide resolved
transforms/code/higher_order_syntactic_profiler/ray/pyproject.toml
Outdated
Show resolved
Hide resolved
1. Updated makefiles to new the format used in noop 2. Updated author names in toml file Signed-off-by: aishwariyachakraborty <[email protected]>
I have made all the changes you have requested. I do not have an option to merge yet. It shows : Merging is blocked |
|
||
| Parameter | Default | Description | | ||
|------------|----------|--------------| | ||
| `HOSP_METRICS_LIST` | `CCR` | Metrics to be calculated for profiling. Multiple metrics can be entered separated by space. Only valid metric is `CCR` as of now. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the key needs to be in lower case, per the transform implementation. And, the transform seems to look for metrics
not metrics_list
and definitely not hosp_metrics_list
.
the options provided by | ||
the [python launcher](../../../../data-processing-lib/doc/python-launcher-options.md). | ||
``` | ||
--hosp_metrics_list HOSP_METRICS_LIST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/pdf2parquet/python/README.md as a template for documenting the configurations and annotations. In particular, you haven't discussed the annotations that are added, but following the format for the configuration would be good too.
input parquet to the output folder, without modification. | ||
""" | ||
self.logger.debug(f"Transforming one table with {len(table)} rows") | ||
if self.metrics_list is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should check for this error/condition in the initializer and throw an exception there if a bad value, such as None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears your are adding a column whose name is the name of the metric. This is not clear from README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you really want to distribute. these, maybe they belong in a data
directory that is a sibling of src
and then include them in the wheel via the toml file. My problem with this is that I'm wondering how someone would specify this file (or other data files) when using the wheel and not the repo project. If they can be referenced from within the wheel, it would be better to specify data/ikb/ikb_model.csv
than src/ikb/ikb_model.csv
I think.
"pipeline": "pipeline_id", | ||
"job details": { | ||
"job category": "preprocessing", | ||
"job name": "NOOP", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file needs to be regenerated since it seems to have been copied from the NOOP project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aishwariyachakraborty: Please slack me when you have change to discuss - Internal ID [email protected]
It was suggested that we combine this with the Syntactic Construct Extractor transform and create a single transform. So I am closing this PR. I will create a new PR after the PR for Syntactic Construct Extractor transform is merged with the dev branch. Hope this is ok? |
Why are these changes needed?
This PR implements the higher order syntactic and semantic profiler transforms as a downstream application of the UBSR transform.
Related issue number (if any).