Higher order syntactic and semantic profiler transforms #653

aishwariyachakraborty · 2024-10-01T18:48:39Z

Why are these changes needed?

This PR implements the higher order syntactic and semantic profiler transforms as a downstream application of the UBSR transform.

Related issue number (if any).

Signed-off-by: aishwariyachakraborty <[email protected]>

transforms/code/higher_order_syntactic_profiler/python/README.md

transforms/code/semantic_profiler/python/src/sp_transform.py

Signed-off-by: aishwariyachakraborty <[email protected]>

daw3rd · 2024-10-04T12:01:19Z

.github/workflows/test-code-semantic_profiler.yml

please merge your branch with dev and regenerate this file.

I have made all the changes you have requested. I do not have an option to merge yet. It shows : Merging is blocked
Merging can be performed automatically once the requested changes are addressed.

The merge I'm referring to is one you do locally into your fork.

daw3rd · 2024-10-04T12:01:24Z

.github/workflows/test-code-higher_order_syntactic_profiler.yml

please merge your branch with dev and regenerate this file.

transforms/code/higher_order_syntactic_profiler/Makefile

transforms/code/higher_order_syntactic_profiler/python/pyproject.toml

transforms/code/higher_order_syntactic_profiler/ray/pyproject.toml

transforms/code/semantic_profiler/python/pyproject.toml

transforms/code/semantic_profiler/ray/pyproject.toml

transforms/code/semantic_profiler/Makefile

1. Updated makefiles to new the format used in noop 2. Updated author names in toml file Signed-off-by: aishwariyachakraborty <[email protected]>

aishwariyachakraborty · 2024-10-04T13:00:27Z

I have made all the changes you have requested. I do not have an option to merge yet. It shows : Merging is blocked
Merging can be performed automatically once the requested changes are addressed.

daw3rd · 2024-10-04T12:59:09Z

transforms/code/higher_order_syntactic_profiler/python/README.md

+
+| Parameter  | Default  | Description  |
+|------------|----------|--------------|
+| `HOSP_METRICS_LIST`         | `CCR`        | Metrics to be calculated for profiling. Multiple metrics can be entered separated by space. Only valid metric is `CCR` as of now. |


I believe the key needs to be in lower case, per the transform implementation. And, the transform seems to look for metrics not metrics_list and definitely not hosp_metrics_list.

transforms/code/higher_order_syntactic_profiler/python/pyproject.toml

daw3rd · 2024-10-04T13:05:18Z

transforms/code/higher_order_syntactic_profiler/python/README.md

+the options provided by 
+the [python launcher](../../../../data-processing-lib/doc/python-launcher-options.md).
+```
+  --hosp_metrics_list HOSP_METRICS_LIST


Can you use https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/pdf2parquet/python/README.md as a template for documenting the configurations and annotations. In particular, you haven't discussed the annotations that are added, but following the format for the configuration would be good too.

daw3rd · 2024-10-04T13:06:31Z

transforms/code/higher_order_syntactic_profiler/python/src/hosp_transform.py

+        input parquet to the output folder, without modification.
+        """
+        self.logger.debug(f"Transforming one table with {len(table)} rows")
+        if self.metrics_list is not None:


You should check for this error/condition in the initializer and throw an exception there if a bad value, such as None.

It appears your are adding a column whose name is the name of the metric. This is not clear from README.

transforms/code/semantic_profiler/python/src/examples/examples-i.csv

daw3rd · 2024-10-04T13:23:45Z

transforms/code/semantic_profiler/python/src/ikb/ikb_model.csv

if you really want to distribute. these, maybe they belong in a data directory that is a sibling of src and then include them in the wheel via the toml file. My problem with this is that I'm wondering how someone would specify this file (or other data files) when using the wheel and not the repo project. If they can be referenced from within the wheel, it would be better to specify data/ikb/ikb_model.csv than src/ikb/ikb_model.csv I think.

transforms/code/semantic_profiler/ray/pyproject.toml

daw3rd · 2024-10-04T13:29:38Z

transforms/code/semantic_profiler/ray/test-data/expected/metadata.json

+    "pipeline": "pipeline_id",
+    "job details": {
+        "job category": "preprocessing",
+        "job name": "NOOP",


this file needs to be regenerated since it seems to have been copied from the NOOP project.

touma-I

@aishwariyachakraborty: Please slack me when you have change to discuss - Internal ID [email protected]

aishwariyachakraborty · 2024-10-14T10:29:23Z

It was suggested that we combine this with the Syntactic Construct Extractor transform and create a single transform. So I am closing this PR. I will create a new PR after the PR for Syntactic Construct Extractor transform is merged with the dev branch. Hope this is ok?

profiler transforms

3e3f632

Signed-off-by: aishwariyachakraborty <[email protected]>

daw3rd requested changes Oct 2, 2024

View reviewed changes

transforms/code/higher_order_syntactic_profiler/python/README.md Show resolved Hide resolved

transforms/code/semantic_profiler/python/src/sp_transform.py Show resolved Hide resolved

changes to ensure data files are present in wheel

230c500

Signed-off-by: aishwariyachakraborty <[email protected]>

aishwariyachakraborty force-pushed the syntactic-semantic-profiler branch from eab62a5 to 230c500 Compare October 3, 2024 11:24

test-src working

5895f98

Signed-off-by: aishwariyachakraborty <[email protected]>

aishwariyachakraborty requested a review from daw3rd October 3, 2024 11:34

aishwariyachakraborty marked this pull request as draft October 3, 2024 11:35

Updated params in readme

ec3998e

Signed-off-by: aishwariyachakraborty <[email protected]>

aishwariyachakraborty marked this pull request as ready for review October 4, 2024 04:54

daw3rd requested changes Oct 4, 2024

View reviewed changes

Made tha review changes requested

15de249

1. Updated makefiles to new the format used in noop 2. Updated author names in toml file Signed-off-by: aishwariyachakraborty <[email protected]>

daw3rd requested changes Oct 4, 2024

View reviewed changes

touma-I self-requested a review October 12, 2024 15:25

touma-I requested changes Oct 12, 2024

View reviewed changes

aishwariyachakraborty closed this by deleting the head repository Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Higher order syntactic and semantic profiler transforms #653

Higher order syntactic and semantic profiler transforms #653

aishwariyachakraborty commented Oct 1, 2024

daw3rd Oct 4, 2024

aishwariyachakraborty Oct 4, 2024

daw3rd Oct 14, 2024

daw3rd Oct 4, 2024

aishwariyachakraborty commented Oct 4, 2024

daw3rd Oct 4, 2024

daw3rd Oct 4, 2024

daw3rd Oct 4, 2024

daw3rd Oct 4, 2024

daw3rd Oct 4, 2024

daw3rd Oct 4, 2024

touma-I left a comment

aishwariyachakraborty commented Oct 14, 2024 •

edited

Loading

Higher order syntactic and semantic profiler transforms #653

Higher order syntactic and semantic profiler transforms #653

Conversation

aishwariyachakraborty commented Oct 1, 2024

Why are these changes needed?

Related issue number (if any).

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aishwariyachakraborty commented Oct 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

touma-I left a comment

Choose a reason for hiding this comment

aishwariyachakraborty commented Oct 14, 2024 • edited Loading

aishwariyachakraborty commented Oct 14, 2024 •

edited

Loading