-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build transforms wheel #493
Conversation
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a
- Makefile in each new directory and support the build, test, publish, clean and set-versions targets.
- A readme in each directory.
- Extra credit for a pypi-specific readme
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
transforms/packaging/python/Makefile
Outdated
TRANSFORMS_NAMES = code/code_quality \ | ||
code/code2parquet \ | ||
code/header_cleanser \ | ||
code/code_quality \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code2parquet, malware
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
malware transforms need some additional work to be included as a pip install and was intentionally excluded from this initial release... It seem to require a docker container and does not fit in the current use case for running these transforms in a notebook.
* [code2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code2parquet/python/README.md) | ||
* header_cleanser (Not available on MacOS) | ||
* code_quality | ||
* proglang_select |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
malware
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
malware transforms need some additional work to be included as a pip install and was intentionally excluded from this initial release... It seem to require a docker container and does not fit in the current use case for running these transforms in a notebook.
transforms/packaging/ray/README.md
Outdated
* proglang_select | ||
* header_cleanser (Not available on MacOS) | ||
* code_quality | ||
* repo_level_ordering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
malware
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
malware transforms need some additional work to be included as a pip install and was intentionally excluded from this initial release... It seem to require a docker container and does not fit in the current use case for running these transforms in a notebook.
transforms/packaging/ray/README.md
Outdated
* profiler | ||
* doc_id | ||
* filter | ||
* resize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
profiler,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blublinsky what is meant by this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing transform
Signed-off-by: Maroun Touma <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need a recursive Makefile in transforms/packaging directory.
@@ -0,0 +1,53 @@ | |||
[project] | |||
name = "data_prep_toolkit_transforms_ray" | |||
version = "0.2.1.dev1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dev0
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
@touma-I I think you will also need to update the .github/workflows/test.yml to add a |
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: David Wood <[email protected]>
Signed-off-by: David Wood <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
… into build-transforms-wheel
Signed-off-by: Maroun Touma <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may need to modify scripts/check-transform-workflows.sh to not include the packaging directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -480,7 +480,8 @@ endif | |||
if [ -e requirements.txt ]; then \ | |||
echo Installing requirements from requirements.txt; \ | |||
pip install $(PIP_INSTALL_EXTRA_ARGS) $$extra_url -r requirements.txt; \ | |||
elif [ -e pyproject.toml ]; then \ | |||
fi; \ | |||
if [ -e pyproject.toml ]; then \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you want to allow dependencies installation from both requirements.txt
and pyproject.toml
. It can confuse.
I'd add a WARNING message if both files exist. And specify in the message the installation order: first from requirements.txtand after that from
pyproject.toml`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @roytman . The situation we ran into is the pyproject.toml for the single package is using requirements.txt that was compiled from the different dependencies listed by all transforms. I will also be changing all the transforms pyproject.toml to use requirements.txt to list their dependencies. But agree, I need to watch this one closely.
@@ -587,6 +588,18 @@ MINIO_ADMIN_PWD= localminiosecretkey | |||
> tt.toml; \ | |||
mv tt.toml pyproject.toml; \ | |||
fi | |||
@if [ -e requirements.txt ]; then \ | |||
cat requirements.txt | sed \ | |||
-e 's/data-prep-toolkit-ray\([=><~][=]\).*/data-prep-toolkit-ray\1$(DPK_LIB_VERSION)/' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can all these sed
operations be replaced by a single macro? the lines 582-587 are almost identical to lines 593-599.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Almost but not exactly. But agree, I will take a deeper look in a followup PR.
@@ -1,6 +1,6 @@ | |||
## Data prep kit | |||
data-prep-toolkit-transforms==0.2.1.dev1 | |||
data-prep-toolkit-transforms-ray==0.2.1.dev1 | |||
#data-prep-toolkit-transforms==0.2.1.dev1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that these are comments, but should they be with 0.2.1.dev3
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this notebook will still work with dev3. There were changes to the transforms that broke the notebook. I think Sunjee has a new release he will be checking in soon.
@@ -0,0 +1,5 @@ | |||
**/src |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we combine it (except src) with the .gitignore in the root directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work to. I think there may be situations in the future where we want to. add the wheel for specific transforms to git so folks can download it from git. This is still under discussion but yes, once we have consensus, we should move the gitignore to higher level as appropriate.
transforms/packaging/README.md
Outdated
|
||
|
||
```` | ||
cd package-release/transforms/packaging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think users are in package-release
, so the command should be cd transforms/packaging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done! Thanks
transforms/packaging/README.md
Outdated
This procedure will run all the UT for each individual transforms using a single package configuration: | ||
|
||
```` | ||
cd package-release/transforms/packaging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above, or write something like, if you are not in the package-release/transforms/packaging
, cd into it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thanks
transforms/packaging/README.md
Outdated
This procedure will buid two wheels: one for the python transforms and one for the ray transforms. | ||
|
||
```` | ||
cd package-release/transforms/packaging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thanks
* [resize](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/resize/python/README.md) | ||
* [tokenization](https://github.com/IBM/data-prep-kit/blob/dev/transforms/tokenization/doc_chunk/python/README.md) | ||
* [doc_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/doc_id/python/README.md) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot keep this list up to date, so it might be worth adding something like the list includes some of the transformers but doesn't all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thanks
Signed-off-by: Maroun Touma <[email protected]>
|
- "*" | ||
paths: | ||
- "transforms/packaging/python/**" | ||
- "data-processing-lib/**" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't include the transforms, you should probable not include the core library either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i pushed a fix for this
- "*" | ||
paths: | ||
- "transforms/packaging/ray/**" | ||
- "data-processing-lib/**" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, library not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i pushed a fix for this.
@@ -0,0 +1,60 @@ | |||
REPOROOT=../../ | |||
# Use make help, to see the available rules | |||
include ../../.make.defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should include $(REPOROOT)/.make.defaults and in general use REPOROOT
# code/repo_level_ordering # Missing implementation | ||
|
||
|
||
TRANSFORMS_NAMES = code/code_quality \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should build this list automatically by looking for the python directories in transforms//
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess this file being created by hand? is the plan to eventually automate this file.
Let's address those in a separate PR. Thanks
Signed-off-by: David Wood <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i fixed workflow paths. others are minor comments.
Why are these changes needed?
Allow users to do pip install of all the transforms without having to clone the project.
This process clones the project behind the scene and installs the various transforms.
Also account for the transforms that do not run on MacOS
Related issue number (if any).