-
Notifications
You must be signed in to change notification settings - Fork 173
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'dev' into transform-pipeline
- Loading branch information
Showing
33 changed files
with
3,145 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
|
||
# Directories in the transforms/universal directory for which we want to generate test workflows | ||
UNIVERSAL_TRANSFORMS=doc_id ededup fdedup filter html2parquet noop profiler resize tokenization | ||
# Directories in the transforms/code directory for which we want to generate test workflows | ||
CODE_TRANSFORMS=code2parquet code_quality header_cleanser malware proglang_select repo_level_ordering | ||
# Directories in the transforms/language directory for which we want to generate test workflows | ||
LANG_TRANSFORMS=doc_chunk doc_quality lang_id pdf2parquet pii_redactor text_encoder | ||
|
||
|
||
transform-tests: | ||
$(MAKE) TRANSFORM_SUBDIR=universal .transform-tests | ||
$(MAKE) TRANSFORM_SUBDIR=language .transform-tests | ||
$(MAKE) TRANSFORM_SUBDIR=code .transform-tests | ||
|
||
# Expects | ||
# TRANSFORM_SUBDIR transforms subdirectory (such as universal) | ||
.transform-tests: | ||
@for i in $$(find ../../transforms/$(TRANSFORM_SUBDIR) -depth 1 -type d); do \ | ||
dir=$$(basename $$i); \ | ||
yml=test-$(TRANSFORM_SUBDIR)-$$dir.yml; \ | ||
echo Generating $$yml; \ | ||
cat test-transform.template | sed -e "s?@TARGET_TRANSFORM_DIR@?transforms/$${TRANSFORM_SUBDIR}/$$dir?g" > $$yml; \ | ||
done | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# Workflow Management | ||
|
||
Here we have the start of a system to automatically generated github workflows (currently only for transforms). | ||
In general, the design is to use templates and `make` to generate/update the workflows. | ||
|
||
#### Goals | ||
1. Run only tests for a given transform when only the transform changes. | ||
Includes python, ray, spark and kfp_ray as available. | ||
2. When the core dpk lib components files changes, test all transforms | ||
3. When the shared kfp components changes, test a randomly selected transform test | ||
(We would like to avoid running all transform kfp tests in one PR) | ||
4. Extra credit: If .md or other non-code changes are made, run no tests. | ||
|
||
#### Assumptions | ||
1. All transforms will have test workflows. A transform can disable its tests locally | ||
(temporarily?) by renaming its Makefile. For example, | ||
`cp transforms/universal/noop/Makefile transforms/universal/noop/Makefile.disabled`. | ||
|
||
## DPK libraries (`data-processing-lib` directory) | ||
The DPK libraries, in data-processing-lib/{python,ray,spark}, are tested | ||
via the fixed | ||
[test-lib.yml](test-lib.yml) | ||
file and is triggered when any code files in that tree change. | ||
|
||
The transforms test workflows also depend on this directory tree and so | ||
changes made here will trigger transform tests. | ||
|
||
## Transforms (`transforms` directory tree) | ||
We define a unique test workflow for each transform, based on a common | ||
template [test-transform.template](test-transform.template). | ||
The [Makefile](Makefile) is used to (re)generate all workflows a necessary. | ||
By design, workflows for a given transform should run when | ||
|
||
* anything of substance effecting operation is modified in the transform's directory tree. | ||
* anything in the core libraries in this repo (e.g., data-processing/lib) assuming the transform depends on these. | ||
|
||
Note that the kfp tests (in kfp_ray/Makefile workflow-test) for a given transform are | ||
**not** currently being run when the transform's tests are run. | ||
Currently these are run randomly via the [test-kfp.yml](test-kfp.yml). | ||
We expect to fix this is in the future. | ||
|
||
When a new transform is added to the repository, | ||
|
||
1. Run `make` in this directory to create the new test .yml for all transforms found in transforms/{universal,code,language} directories | ||
1. commit and push the change to your branch with the new transform. | ||
|
||
Something like the following: | ||
``` | ||
git clone .... | ||
... | ||
git checkout -b new-branch | ||
make # Creates new test*.yml workflows | ||
git commit -a -s -m "update workflows" | ||
git push --set-upstream origin new-branch | ||
``` | ||
|
||
## KFP (`kfp` directory tree) | ||
|
||
Like DPK core libs, kfp tests are defined in | ||
[test-kfp.yml](test-kfp.yml) and run whenever changes are made in | ||
the `kfp` directory tree. Tests currently include | ||
|
||
1. test kfp on randomly selected transform. | ||
|
||
Eventually we would like to enable the transform-specific kfp test | ||
when only the transform code is modified or maybe when only | ||
the `kfp_ray` directory contents is modified. | ||
|
||
## Miscellaneous | ||
[test-misc.yml](test-misc.yml) defines some repo consistency tests including | ||
|
||
1. Make sure `set-versions` make target can be run recursively throughout the repo | ||
2. Makes sure there is a test workflow for each transform in the repo. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
# | ||
# DO NOT EDIT THIS FILE: it is generated from test-transform.template, Edit there and run make to change these files | ||
# | ||
name: Test - transforms/code/code2parquet | ||
|
||
on: | ||
workflow_dispatch: | ||
push: | ||
branches: | ||
- "dev" | ||
- "releases/**" | ||
tags: | ||
- "*" | ||
paths: | ||
- "transforms/code/code2parquet/**" | ||
- "data-processing-lib/**" | ||
- "!transforms/code/code2parquet/**/kfp_ray/**" # This is/will be tested in separate workflow | ||
- "!data-processing-lib/**/test/**" | ||
- "!data-processing-lib/**/test-data/**" | ||
- "!**.md" | ||
- "!**/doc/**" | ||
- "!**/images/**" | ||
- "!**.gitignore" | ||
pull_request: | ||
branches: | ||
- "dev" | ||
- "releases/**" | ||
paths: | ||
- "transforms/code/code2parquet/**" | ||
- "data-processing-lib/**" | ||
- "!transforms/code/code2parquet/**/kfp_ray/**" # This is/will be tested in separate workflow | ||
- "!data-processing-lib/**/test/**" | ||
- "!data-processing-lib/**/test-data/**" | ||
- "!**.md" | ||
- "!**/doc/**" | ||
- "!**/images/**" | ||
- "!**.gitignore" | ||
|
||
jobs: | ||
check_if_push_image: | ||
# check whether the Docker images should be pushed to the remote repository | ||
# The images are pushed if it is a merge to dev branch or a new tag is created. | ||
# The latter being part of the release process. | ||
# The images tag is derived from the value of the DOCKER_IMAGE_VERSION variable set in the .make.versions file. | ||
runs-on: ubuntu-22.04 | ||
outputs: | ||
publish_images: ${{ steps.version.outputs.publish_images }} | ||
steps: | ||
- id: version | ||
run: | | ||
publish_images='false' | ||
if [[ ${GITHUB_REF} == refs/heads/dev && ${GITHUB_EVENT_NAME} != 'pull_request' && ${GITHUB_REPOSITORY} == IBM/data-prep-kit ]] ; | ||
then | ||
publish_images='true' | ||
fi | ||
if [[ ${GITHUB_REF} == refs/tags/* && ${GITHUB_REPOSITORY} == IBM/data-prep-kit ]] ; | ||
then | ||
publish_images='true' | ||
fi | ||
echo "publish_images=$publish_images" >> "$GITHUB_OUTPUT" | ||
test-src: | ||
runs-on: ubuntu-22.04 | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
- name: Free up space in github runner | ||
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173 | ||
run: | | ||
df -h | ||
sudo rm -rf "/usr/local/share/boost" | ||
sudo rm -rf "$AGENT_TOOLSDIRECTORY" | ||
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/local/.ghcup | ||
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true | ||
df -h | ||
- name: Test transform source in transforms/code/code2parquet | ||
run: | | ||
if [ -e "transforms/code/code2parquet/Makefile" ]; then | ||
make -C transforms/code/code2parquet DOCKER=docker test-src | ||
else | ||
echo "transforms/code/code2parquet/Makefile not found - source testing disabled for this transform." | ||
fi | ||
test-image: | ||
needs: [check_if_push_image] | ||
runs-on: ubuntu-22.04 | ||
timeout-minutes: 120 | ||
env: | ||
DOCKER_REGISTRY_USER: ${{ secrets.DOCKER_REGISTRY_USER }} | ||
DOCKER_REGISTRY_KEY: ${{ secrets.DOCKER_REGISTRY_KEY }} | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
- name: Free up space in github runner | ||
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173 | ||
run: | | ||
df -h | ||
sudo rm -rf /opt/ghc | ||
sudo rm -rf "/usr/local/share/boost" | ||
sudo rm -rf "$AGENT_TOOLSDIRECTORY" | ||
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/lib/jvm /usr/local/.ghcup | ||
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true | ||
df -h | ||
- name: Test transform image in transforms/code/code2parquet | ||
run: | | ||
if [ -e "transforms/code/code2parquet/Makefile" ]; then | ||
make -C data-processing-lib/spark DOCKER=docker image | ||
make -C transforms/code/code2parquet DOCKER=docker test-image | ||
else | ||
echo "transforms/code/code2parquet/Makefile not found - testing disabled for this transform." | ||
fi | ||
- name: Print space | ||
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173 | ||
run: | | ||
df -h | ||
docker images | ||
- name: Publish images | ||
if: needs.check_if_push_image.outputs.publish_images == 'true' | ||
run: | | ||
if [ -e "transforms/code/code2parquet/Makefile" ]; then | ||
make -C transforms/code/code2parquet publish | ||
else | ||
echo "transforms/code/code2parquet/Makefile not found - publishing disabled for this transform." | ||
fi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
# | ||
# DO NOT EDIT THIS FILE: it is generated from test-transform.template, Edit there and run make to change these files | ||
# | ||
name: Test - transforms/code/code_quality | ||
|
||
on: | ||
workflow_dispatch: | ||
push: | ||
branches: | ||
- "dev" | ||
- "releases/**" | ||
tags: | ||
- "*" | ||
paths: | ||
- "transforms/code/code_quality/**" | ||
- "data-processing-lib/**" | ||
- "!transforms/code/code_quality/**/kfp_ray/**" # This is/will be tested in separate workflow | ||
- "!data-processing-lib/**/test/**" | ||
- "!data-processing-lib/**/test-data/**" | ||
- "!**.md" | ||
- "!**/doc/**" | ||
- "!**/images/**" | ||
- "!**.gitignore" | ||
pull_request: | ||
branches: | ||
- "dev" | ||
- "releases/**" | ||
paths: | ||
- "transforms/code/code_quality/**" | ||
- "data-processing-lib/**" | ||
- "!transforms/code/code_quality/**/kfp_ray/**" # This is/will be tested in separate workflow | ||
- "!data-processing-lib/**/test/**" | ||
- "!data-processing-lib/**/test-data/**" | ||
- "!**.md" | ||
- "!**/doc/**" | ||
- "!**/images/**" | ||
- "!**.gitignore" | ||
|
||
jobs: | ||
check_if_push_image: | ||
# check whether the Docker images should be pushed to the remote repository | ||
# The images are pushed if it is a merge to dev branch or a new tag is created. | ||
# The latter being part of the release process. | ||
# The images tag is derived from the value of the DOCKER_IMAGE_VERSION variable set in the .make.versions file. | ||
runs-on: ubuntu-22.04 | ||
outputs: | ||
publish_images: ${{ steps.version.outputs.publish_images }} | ||
steps: | ||
- id: version | ||
run: | | ||
publish_images='false' | ||
if [[ ${GITHUB_REF} == refs/heads/dev && ${GITHUB_EVENT_NAME} != 'pull_request' && ${GITHUB_REPOSITORY} == IBM/data-prep-kit ]] ; | ||
then | ||
publish_images='true' | ||
fi | ||
if [[ ${GITHUB_REF} == refs/tags/* && ${GITHUB_REPOSITORY} == IBM/data-prep-kit ]] ; | ||
then | ||
publish_images='true' | ||
fi | ||
echo "publish_images=$publish_images" >> "$GITHUB_OUTPUT" | ||
test-src: | ||
runs-on: ubuntu-22.04 | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
- name: Free up space in github runner | ||
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173 | ||
run: | | ||
df -h | ||
sudo rm -rf "/usr/local/share/boost" | ||
sudo rm -rf "$AGENT_TOOLSDIRECTORY" | ||
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/local/.ghcup | ||
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true | ||
df -h | ||
- name: Test transform source in transforms/code/code_quality | ||
run: | | ||
if [ -e "transforms/code/code_quality/Makefile" ]; then | ||
make -C transforms/code/code_quality DOCKER=docker test-src | ||
else | ||
echo "transforms/code/code_quality/Makefile not found - source testing disabled for this transform." | ||
fi | ||
test-image: | ||
needs: [check_if_push_image] | ||
runs-on: ubuntu-22.04 | ||
timeout-minutes: 120 | ||
env: | ||
DOCKER_REGISTRY_USER: ${{ secrets.DOCKER_REGISTRY_USER }} | ||
DOCKER_REGISTRY_KEY: ${{ secrets.DOCKER_REGISTRY_KEY }} | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
- name: Free up space in github runner | ||
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173 | ||
run: | | ||
df -h | ||
sudo rm -rf /opt/ghc | ||
sudo rm -rf "/usr/local/share/boost" | ||
sudo rm -rf "$AGENT_TOOLSDIRECTORY" | ||
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/lib/jvm /usr/local/.ghcup | ||
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true | ||
df -h | ||
- name: Test transform image in transforms/code/code_quality | ||
run: | | ||
if [ -e "transforms/code/code_quality/Makefile" ]; then | ||
make -C data-processing-lib/spark DOCKER=docker image | ||
make -C transforms/code/code_quality DOCKER=docker test-image | ||
else | ||
echo "transforms/code/code_quality/Makefile not found - testing disabled for this transform." | ||
fi | ||
- name: Print space | ||
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173 | ||
run: | | ||
df -h | ||
docker images | ||
- name: Publish images | ||
if: needs.check_if_push_image.outputs.publish_images == 'true' | ||
run: | | ||
if [ -e "transforms/code/code_quality/Makefile" ]; then | ||
make -C transforms/code/code_quality publish | ||
else | ||
echo "transforms/code/code_quality/Makefile not found - publishing disabled for this transform." | ||
fi |
Oops, something went wrong.