Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start triggering testing at finer granularity in the repo #595

Merged
merged 65 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from 63 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
3fe8d2c
disable test workflow when none code files change
daw3rd Sep 13, 2024
86951f8
one more path-ignore in test.yml
daw3rd Sep 16, 2024
d368dc7
one more fix for path-ignore in test.yml
daw3rd Sep 16, 2024
6d7c186
test universal transform separately
daw3rd Sep 17, 2024
35f9f31
rename test universal workflow
daw3rd Sep 17, 2024
37cd7ef
add comments to noop src to trigger new universal test workflow
daw3rd Sep 17, 2024
a7fa50b
fix paths in test universal workflow
daw3rd Sep 17, 2024
300d06a
addj back ignore paths in test universal workflow
daw3rd Sep 17, 2024
0154179
another noop comment
daw3rd Sep 17, 2024
5dd78e6
move ignored paths to paths in univesal test workflow
daw3rd Sep 17, 2024
d91da0d
test-universal workflow name changes
daw3rd Sep 17, 2024
1be3292
noop comments
daw3rd Sep 17, 2024
54d48b6
noop readme change'
daw3rd Sep 17, 2024
288ffdf
change test universal not paths
daw3rd Sep 17, 2024
668b70d
disable all but new noop and doc_id test workflows
daw3rd Sep 17, 2024
aa24ee7
code change in noop
daw3rd Sep 17, 2024
1d72cf5
remake test transforms
daw3rd Sep 17, 2024
142da80
Merge branch 'dev' into cicd-opt
daw3rd Sep 17, 2024
6bee51b
add individual test transform workflows
daw3rd Sep 17, 2024
db18c74
noop README change
daw3rd Sep 17, 2024
42e18fc
better ignore of .md on test transform workflows
daw3rd Sep 17, 2024
ca09d75
noop readme change
daw3rd Sep 17, 2024
f16780b
noop test transform worklow 1 ignore
daw3rd Sep 17, 2024
189bdd7
noop readme
daw3rd Sep 17, 2024
54b3c69
split out the tests into test-kfp/lib/misc and remove test.yml, add r…
daw3rd Sep 17, 2024
d530ac9
test-kfp only on kfp/**
daw3rd Sep 17, 2024
781113f
noop code change to trigger build
daw3rd Sep 17, 2024
f09fbee
comments in workflows
daw3rd Sep 17, 2024
193f439
updated workflow readme
daw3rd Sep 17, 2024
3ed0fbd
only run build-library workflow on data-processing-lib changes
daw3rd Sep 17, 2024
60343d3
try and ignore docs in build-library, test-kfp/lib
daw3rd Sep 17, 2024
dec3aa9
workflow title changes for consistency
daw3rd Sep 17, 2024
e6b1d62
test change on filter source
daw3rd Sep 17, 2024
c40cf1f
change to lib readme
daw3rd Sep 17, 2024
c4853d1
change to lib source
daw3rd Sep 17, 2024
6badd06
Merge branch 'dev' into cicd-opt
daw3rd Sep 17, 2024
c4417fc
minor job name changes in transform workflows
daw3rd Sep 17, 2024
8270c6c
noop readme
daw3rd Sep 17, 2024
dc06b94
test-lib workflow ignores
daw3rd Sep 17, 2024
428833e
top level readme
daw3rd Sep 17, 2024
0bd7992
noop test source
daw3rd Sep 17, 2024
31b7f53
filter source change'
daw3rd Sep 17, 2024
e78df7b
updated all transform tets workflows
daw3rd Sep 17, 2024
684ae7a
fix typo in test template on check_images
daw3rd Sep 18, 2024
f419059
noop src change
daw3rd Sep 18, 2024
a887bf9
check for makefile in test transform workflow
daw3rd Sep 18, 2024
f9399c1
automatically determine transforms in transforms directory for which …
daw3rd Sep 18, 2024
e064e00
worklow readme, transform existence verification, disable build-libra…
daw3rd Sep 18, 2024
3a81d21
workflow readme details on kfp and misc tests
daw3rd Sep 18, 2024
5d957ed
backing out change to dpk lib code
daw3rd Sep 19, 2024
b02b52c
restore filter code
daw3rd Sep 19, 2024
7cb4390
restore noop code
daw3rd Sep 19, 2024
b780cf5
workflow readme
daw3rd Sep 19, 2024
9a5ccdc
really restore noop code
daw3rd Sep 19, 2024
93c56c2
check for makefile in transform test-src testing
daw3rd Sep 19, 2024
8f8592d
don't include lib test dependencies in transform test workflows
daw3rd Sep 19, 2024
40ce888
noop code change
daw3rd Sep 19, 2024
cb0d1f9
disable noop, don't include lib test-data in transform dependencies
daw3rd Sep 19, 2024
d6c5784
use job.id.if on Makefile to enable transform test job
daw3rd Sep 19, 2024
4f6b2f2
use job.id.if on Makefile to enable transform test job
daw3rd Sep 19, 2024
c0f4935
restore noop Makefile
daw3rd Sep 19, 2024
0a09daa
exclude kfp_ray from transfor test workflow and change noop code
daw3rd Sep 19, 2024
1bfb33d
remove if: from test workflows
daw3rd Sep 19, 2024
a80c0fd
backout noop code change
daw3rd Sep 19, 2024
ee08033
backout noop code change
daw3rd Sep 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .github/workflows/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@

# Directories in the transforms/universal directory for which we want to generate test workflows
UNIVERSAL_TRANSFORMS=doc_id ededup fdedup filter html2parquet noop profiler resize tokenization
# Directories in the transforms/code directory for which we want to generate test workflows
CODE_TRANSFORMS=code2parquet code_quality header_cleanser malware proglang_select repo_level_ordering
# Directories in the transforms/language directory for which we want to generate test workflows
LANG_TRANSFORMS=doc_chunk doc_quality lang_id pdf2parquet pii_redactor text_encoder
daw3rd marked this conversation as resolved.
Show resolved Hide resolved


transform-tests:
$(MAKE) TRANSFORM_SUBDIR=universal .transform-tests
$(MAKE) TRANSFORM_SUBDIR=language .transform-tests
$(MAKE) TRANSFORM_SUBDIR=code .transform-tests

# Expects
# TRANSFORM_SUBDIR transforms subdirectory (such as universal)
.transform-tests:
@for i in $$(find ../../transforms/$(TRANSFORM_SUBDIR) -depth 1 -type d); do \
dir=$$(basename $$i); \
yml=test-$(TRANSFORM_SUBDIR)-$$dir.yml; \
echo Generating $$yml; \
cat test-transform.template | sed -e "s?@TARGET_TRANSFORM_DIR@?transforms/$${TRANSFORM_SUBDIR}/$$dir?g" > $$yml; \
done



73 changes: 73 additions & 0 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Workflow Management

Here we have the start of a system to automatically generated github workflows (currently only for transforms).
In general, the design is to use templates and `make` to generate/update the workflows.

#### Goals
1. Run only tests for a given transform when only the transform changes.
Includes python, ray, spark and kfp_ray as available.
2. When the core dpk lib components files changes, test all transforms
3. When the shared kfp components changes, test a randomly selected transform test
(We would like to avoid running all transform kfp tests in one PR)
4. Extra credit: If .md or other non-code changes are made, run no tests.

#### Assumptions
1. All transforms will have test workflows. A transform can disable its tests locally
(temporarily?) by renaming its Makefile. For example,
`cp transforms/universal/noop/Makefile transforms/universal/noop/Makefile.disabled`.

## DPK libraries (`data-processing-lib` directory)
The DPK libraries, in data-processing-lib/{python,ray,spark}, are tested
via the fixed
[test-lib.yml](test-lib.yml)
file and is triggered when any code files in that tree change.

The transforms test workflows also depend on this directory tree and so
changes made here will trigger transform tests.

## Transforms (`transforms` directory tree)
We define a unique test workflow for each transform, based on a common
template [test-transform.template](test-transform.template).
The [Makefile](Makefile) is used to (re)generate all workflows a necessary.
By design, workflows for a given transform should run when

* anything of substance effecting operation is modified in the transform's directory tree.
* anything in the core libraries in this repo (e.g., data-processing/lib) assuming the transform depends on these.

Note that the kfp tests (in kfp_ray/Makefile workflow-test) for a given transform are
**not** currently being run when the transform's tests are run.
Currently these are run randomly via the [test-kfp.yml](test-kfp.yml).
We expect to fix this is in the future.

daw3rd marked this conversation as resolved.
Show resolved Hide resolved
When a new transform is added to the repository,

1. Run `make` in this directory to create the new test .yml for all transforms found in transforms/{universal,code,language} directories
1. commit and push the change to your branch with the new transform.

Something like the following:
```
git clone ....
...
git checkout -b new-branch
make # Creates new test*.yml workflows
git commit -a -s -m "update workflows"
git push --set-upstream origin new-branch
```

## KFP (`kfp` directory tree)

Like DPK core libs, kfp tests are defined in
[test-kfp.yml](test-kfp.yml) and run whenever changes are made in
the `kfp` directory tree. Tests currently include

1. test kfp on randomly selected transform.

Eventually we would like to enable the transform-specific kfp test
when only the transform code is modified or maybe when only
the `kfp_ray` directory contents is modified.

## Miscellaneous
[test-misc.yml](test-misc.yml) defines some repo consistency tests including

1. Make sure `set-versions` make target can be run recursively throughout the repo
2. Makes sure there is a test workflow for each transform in the repo.
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,20 @@ on:
branches:
- "dev"
- "releases/**"
paths:
- "data-processing-lib/**"
- "!**.md"
- "!**/doc/**"
- "!**/.gitignore"
pull_request:
branches:
- "dev"
- "releases/**"
paths:
- "data-processing-lib/**"
- "!**.md"
- "!**/doc/**"
- "!**/.gitignore"
jobs:
build-python-lib:
runs-on: ubuntu-22.04
Expand Down
122 changes: 122 additions & 0 deletions .github/workflows/test-code-code2parquet.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
#
# DO NOT EDIT THIS FILE: it is generated from test-transform.template, Edit there and run make to change these files
#
name: Test - transforms/code/code2parquet

on:
workflow_dispatch:
push:
branches:
- "dev"
- "releases/**"
tags:
- "*"
paths:
- "transforms/code/code2parquet/**"
daw3rd marked this conversation as resolved.
Show resolved Hide resolved
- "data-processing-lib/**"
daw3rd marked this conversation as resolved.
Show resolved Hide resolved
- "!transforms/code/code2parquet/**/kfp_ray/**" # This is/will be tested in separate workflow
- "!data-processing-lib/**/test/**"
- "!data-processing-lib/**/test-data/**"
- "!**.md"
- "!**/doc/**"
- "!**/images/**"
- "!**.gitignore"
pull_request:
branches:
- "dev"
- "releases/**"
paths:
- "transforms/code/code2parquet/**"
- "data-processing-lib/**"
- "!transforms/code/code2parquet/**/kfp_ray/**" # This is/will be tested in separate workflow
- "!data-processing-lib/**/test/**"
- "!data-processing-lib/**/test-data/**"
- "!**.md"
- "!**/doc/**"
- "!**/images/**"
- "!**.gitignore"

jobs:
check_if_push_image:
daw3rd marked this conversation as resolved.
Show resolved Hide resolved
# check whether the Docker images should be pushed to the remote repository
# The images are pushed if it is a merge to dev branch or a new tag is created.
# The latter being part of the release process.
# The images tag is derived from the value of the DOCKER_IMAGE_VERSION variable set in the .make.versions file.
runs-on: ubuntu-22.04
outputs:
publish_images: ${{ steps.version.outputs.publish_images }}
steps:
- id: version
run: |
publish_images='false'
if [[ ${GITHUB_REF} == refs/heads/dev && ${GITHUB_EVENT_NAME} != 'pull_request' && ${GITHUB_REPOSITORY} == IBM/data-prep-kit ]] ;
then
publish_images='true'
fi
if [[ ${GITHUB_REF} == refs/tags/* && ${GITHUB_REPOSITORY} == IBM/data-prep-kit ]] ;
then
publish_images='true'
fi
echo "publish_images=$publish_images" >> "$GITHUB_OUTPUT"
test-src:
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Free up space in github runner
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
run: |
df -h
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/local/.ghcup
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true
df -h
- name: Test transform source in transforms/code/code2parquet
run: |
if [ -e "transforms/code/code2parquet/Makefile" ]; then
make -C transforms/code/code2parquet DOCKER=docker test-src
else
echo "transforms/code/code2parquet/Makefile not found - source testing disabled for this transform."
fi
test-image:
needs: [check_if_push_image]
runs-on: ubuntu-22.04
timeout-minutes: 120
env:
DOCKER_REGISTRY_USER: ${{ secrets.DOCKER_REGISTRY_USER }}
DOCKER_REGISTRY_KEY: ${{ secrets.DOCKER_REGISTRY_KEY }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Free up space in github runner
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
run: |
df -h
sudo rm -rf /opt/ghc
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/lib/jvm /usr/local/.ghcup
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true
df -h
- name: Test transform image in transforms/code/code2parquet
run: |
if [ -e "transforms/code/code2parquet/Makefile" ]; then
make -C data-processing-lib/spark DOCKER=docker image
make -C transforms/code/code2parquet DOCKER=docker test-image
else
echo "transforms/code/code2parquet/Makefile not found - testing disabled for this transform."
fi
- name: Print space
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
run: |
df -h
docker images
- name: Publish images
if: needs.check_if_push_image.outputs.publish_images == 'true'
run: |
if [ -e "transforms/code/code2parquet/Makefile" ]; then
make -C transforms/code/code2parquet publish
else
echo "transforms/code/code2parquet/Makefile not found - publishing disabled for this transform."
fi
122 changes: 122 additions & 0 deletions .github/workflows/test-code-code_quality.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
#
# DO NOT EDIT THIS FILE: it is generated from test-transform.template, Edit there and run make to change these files
#
name: Test - transforms/code/code_quality

on:
workflow_dispatch:
push:
branches:
- "dev"
- "releases/**"
tags:
- "*"
paths:
- "transforms/code/code_quality/**"
- "data-processing-lib/**"
- "!transforms/code/code_quality/**/kfp_ray/**" # This is/will be tested in separate workflow
- "!data-processing-lib/**/test/**"
- "!data-processing-lib/**/test-data/**"
- "!**.md"
- "!**/doc/**"
- "!**/images/**"
- "!**.gitignore"
pull_request:
branches:
- "dev"
- "releases/**"
paths:
- "transforms/code/code_quality/**"
- "data-processing-lib/**"
- "!transforms/code/code_quality/**/kfp_ray/**" # This is/will be tested in separate workflow
- "!data-processing-lib/**/test/**"
- "!data-processing-lib/**/test-data/**"
- "!**.md"
- "!**/doc/**"
- "!**/images/**"
- "!**.gitignore"

jobs:
check_if_push_image:
# check whether the Docker images should be pushed to the remote repository
# The images are pushed if it is a merge to dev branch or a new tag is created.
# The latter being part of the release process.
# The images tag is derived from the value of the DOCKER_IMAGE_VERSION variable set in the .make.versions file.
runs-on: ubuntu-22.04
outputs:
publish_images: ${{ steps.version.outputs.publish_images }}
steps:
- id: version
run: |
publish_images='false'
if [[ ${GITHUB_REF} == refs/heads/dev && ${GITHUB_EVENT_NAME} != 'pull_request' && ${GITHUB_REPOSITORY} == IBM/data-prep-kit ]] ;
then
publish_images='true'
fi
if [[ ${GITHUB_REF} == refs/tags/* && ${GITHUB_REPOSITORY} == IBM/data-prep-kit ]] ;
then
publish_images='true'
fi
echo "publish_images=$publish_images" >> "$GITHUB_OUTPUT"
test-src:
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Free up space in github runner
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
run: |
df -h
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/local/.ghcup
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true
df -h
- name: Test transform source in transforms/code/code_quality
run: |
if [ -e "transforms/code/code_quality/Makefile" ]; then
make -C transforms/code/code_quality DOCKER=docker test-src
else
echo "transforms/code/code_quality/Makefile not found - source testing disabled for this transform."
fi
test-image:
needs: [check_if_push_image]
runs-on: ubuntu-22.04
timeout-minutes: 120
env:
DOCKER_REGISTRY_USER: ${{ secrets.DOCKER_REGISTRY_USER }}
DOCKER_REGISTRY_KEY: ${{ secrets.DOCKER_REGISTRY_KEY }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Free up space in github runner
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
run: |
df -h
sudo rm -rf /opt/ghc
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/lib/jvm /usr/local/.ghcup
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true
df -h
- name: Test transform image in transforms/code/code_quality
run: |
if [ -e "transforms/code/code_quality/Makefile" ]; then
make -C data-processing-lib/spark DOCKER=docker image
make -C transforms/code/code_quality DOCKER=docker test-image
else
echo "transforms/code/code_quality/Makefile not found - testing disabled for this transform."
fi
- name: Print space
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
run: |
df -h
docker images
- name: Publish images
if: needs.check_if_push_image.outputs.publish_images == 'true'
run: |
if [ -e "transforms/code/code_quality/Makefile" ]; then
make -C transforms/code/code_quality publish
else
echo "transforms/code/code_quality/Makefile not found - publishing disabled for this transform."
fi
Loading