-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding hap transform #638
adding hap transform #638
Conversation
|
||
* --model_name_or_path - specifies HAP model which should be compatable with HuggingFace's `AutoModelForSequenceClassification` | ||
* --batch_size - modify it based on the infrastructure capacity. | ||
* --max_length - the maximum length for the tokenizer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are are missing the annotation_column and document_column configuraitions.
Also, can you change to the format for documenting configuration used here https://github.com/IBM/data-prep-kit/tree/dev/transforms/language/pdf2parquet/python?
|
||
# distribution versions is the same as image version. | ||
set-versions: | ||
$(MAKE) TRANSFORM_PYTHON_VERSION=$(HAP_PYTHON_VERSION) TOML_VERSION=$(HAP_PYTHON_VERSION) .transforms.set-versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These HAP_*VERSION require additions to .make.versions at the top of the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make the update in .make.versions but going forward, we should change that: it could be enough to define it in Makefile itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi, at one point, maybe with kfp, these were required to be available more globally. In addition, the ray/Makefile refers to the HAP_PYTHON_VERSION, so may not be able to change unless we coalesce runtimes into a single project.
|
||
venv:: .transforms.python-venv | ||
|
||
install:: pip install -r requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a non-standard rule. You should probably remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a luancher-based tests, test_hap_python.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ian-cho We can go over my proposed changes together if any of it does not make sense.
Co-authored-by: touma-I <[email protected]>
Co-authored-by: touma-I <[email protected]>
Co-authored-by: touma-I <[email protected]>
Co-authored-by: touma-I <[email protected]>
Co-authored-by: touma-I <[email protected]>
Co-authored-by: touma-I <[email protected]>
Co-authored-by: touma-I <[email protected]>
Co-authored-by: touma-I <[email protected]>
Co-authored-by: touma-I <[email protected]>
Co-authored-by: touma-I <[email protected]>
@blublinsky (cc @daw3rd )
All tests passed. Please check |
@ian-cho There was one typo in the pypropoject.toml that was causing the build to fix. This is fixed now as below
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming issues to be created for remaining comments.
super().__init__(config) | ||
self.model_name_or_path = config.get("model_name_or_path") | ||
self.annotation_column = config.get("annotation_column") | ||
self.doc_text_column = config.get("doc_text_column") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should all have defaults or issue an exception if there is no default defined and it is not provided by the user. In general, we should assume (when possible), that the transform class may be instantiated directly by the user. This would mean the CLI is not used and so these could be empty key values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daw3rd agree, it is possible that users directly instantiate the class and there would be empty key values. How about self.model_name_or_path = config.get("model_name_or_path", ibm-granite/granite-guardian-hap-38m)
? the same for other two parameters. I can add a warning message if the user does not specify a model, notifying them that the default model is triggered?
Why are these changes needed?
adding python implementation of hap transform under
universal
directory for DPK outer. Passed all tests when "make test". please review. Thanks.