Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix kfp pipelines testing in github workflow. #611

Merged
merged 6 commits into from
Sep 30, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 25 additions & 4 deletions .github/workflows/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,41 @@ CODE_TRANSFORMS=code2parquet code_quality header_cleanser malware proglang_selec
LANG_TRANSFORMS=doc_chunk doc_quality lang_id pdf2parquet pii_redactor text_encoder


# A list that holds transforms that should not be tested with KFP
KFP_BLACK_LIST="doc_chunk,pdf2parquet,pii_redactor"

transform-tests:
$(MAKE) TRANSFORM_SUBDIR=universal .transform-tests
$(MAKE) TRANSFORM_SUBDIR=language .transform-tests
$(MAKE) TRANSFORM_SUBDIR=code .transform-tests
$(MAKE) TRANSFORM_SUBDIR=universal .transform-tests .transform-kfp-tests
roytman marked this conversation as resolved.
Show resolved Hide resolved
$(MAKE) TRANSFORM_SUBDIR=universal .transform-kfp-tests
$(MAKE) TRANSFORM_SUBDIR=language .transform-tests
$(MAKE) TRANSFORM_SUBDIR=language .transform-kfp-tests
$(MAKE) TRANSFORM_SUBDIR=code .transform-tests
$(MAKE) TRANSFORM_SUBDIR=code .transform-kfp-tests

# Expects
# TRANSFORM_SUBDIR transforms subdirectory (such as universal)
.transform-tests:
@for i in $$(find ../../transforms/$(TRANSFORM_SUBDIR) -depth 1 -type d); do \
@for i in $$(find ../../transforms/$(TRANSFORM_SUBDIR) -mindepth 1 -maxdepth 1 -type d); do \
dir=$$(basename $$i); \
yml=test-$(TRANSFORM_SUBDIR)-$$dir.yml; \
echo Generating $$yml; \
cat test-transform.template | sed -e "s?@TARGET_TRANSFORM_DIR@?transforms/$${TRANSFORM_SUBDIR}/$$dir?g" > $$yml; \
done

.transform-kfp-tests:
@for i in $$(find ../../transforms/$(TRANSFORM_SUBDIR) -mindepth 1 -maxdepth 1 -type d); do \
dir=$$(basename $$i); \
z=$$(echo ${KFP_BLACK_LIST} | grep -v $$dir); \
if [ ! -d ../../transforms/$(TRANSFORM_SUBDIR)/$$dir/kfp_ray ] || [ -z "$$z" ]; then \
continue; \
fi; \
yml=test-$(TRANSFORM_SUBDIR)-$$dir-kfp.yml; \
echo Generating $$yml; \
cat test-kfp-transform.template | sed -e "s?@TARGET_TRANSFORM_DIR@?transforms/$${TRANSFORM_SUBDIR}/$$dir?g" > $$yml; \
done






36 changes: 17 additions & 19 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,23 @@
# Workflow Management

Here we have the start of a system to automatically generated github workflows (currently only for transforms).
Here we have the start of a system to automatically generated github workflows.
In general, the design is to use templates and `make` to generate/update the workflows.

#### Goals
1. Run only tests for a given transform when only the transform changes.
Includes python, ray, spark and kfp_ray as available.
2. When the core dpk lib components files changes, test all transforms
3. When the shared kfp components changes, test a randomly selected transform test
(We would like to avoid running all transform kfp tests in one PR)
3. When the shared kfp components changes or core dpk lib components files changes,
test a randomly selected transform test. Otherwise run kfp test for the changed transforms.
4. Extra credit: If .md or other non-code changes are made, run no tests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this number 4 can be removed.


#### Assumptions
1. All transforms will have test workflows. A transform can disable its tests locally
(temporarily?) by renaming its Makefile. For example,
`cp transforms/universal/noop/Makefile transforms/universal/noop/Makefile.disabled`.
A github action for kfp testing will not be generated if it appears in `KFP_BLACK_LIST`
in the [Makefile](./Makefile).


## DPK libraries (`data-processing-lib` directory)
The DPK libraries, in data-processing-lib/{python,ray,spark}, are tested
Expand All @@ -26,18 +29,18 @@ The transforms test workflows also depend on this directory tree and so
changes made here will trigger transform tests.

## Transforms (`transforms` directory tree)
We define a unique test workflow for each transform, based on a common
template [test-transform.template](test-transform.template).
The [Makefile](Makefile) is used to (re)generate all workflows a necessary.
By design, workflows for a given transform should run when
We define two test workflows for each transform: one is based on a common
template [test-transform.template](test-transform.template) and the other, for kfp testing,
is based on a common template [test-kfp-transform.template](test-kfp-transform.template).
The [Makefile](Makefile) is used to (re)generate all workflows as necessary.
By design, non kfp workflows for a given transform should run when

* anything of substance effecting operation is modified in the transform's directory tree.
* anything in the core libraries in this repo (e.g., data-processing/lib) assuming the transform depends on these.

Note that the kfp tests (in kfp_ray/Makefile workflow-test) for a given transform are
**not** currently being run when the transform's tests are run.
Currently these are run randomly via the [test-kfp.yml](test-kfp.yml).
We expect to fix this is in the future.
The generated kfp workflows should run when anything of substance effecting operation is modified in the transform's directory tree
and non of the core libraries in this repo nor the kfp components were changed.
Otherwise a randomly chosen transform will undergo KFP testing, triggered by the [test-kfp.yml](test-kfp.yml) workflow.

When a new transform is added to the repository,

Expand All @@ -58,16 +61,11 @@ git push --set-upstream origin new-branch

Like DPK core libs, kfp tests are defined in
[test-kfp.yml](test-kfp.yml) and run whenever changes are made in
the `kfp` directory tree. Tests currently include

1. test kfp on randomly selected transform.

Eventually we would like to enable the transform-specific kfp test
when only the transform code is modified or maybe when only
the `kfp_ray` directory contents is modified.
the `kfp` directory tree as well as in the DPK core libs. Tests currently include
test kfp on randomly selected transform.

## Miscellaneous
[test-misc.yml](test-misc.yml) defines some repo consistency tests including

1. Make sure `set-versions` make target can be run recursively throughout the repo
2. Makes sure there is a test workflow for each transform in the repo.
2. Makes sure there is a test workflow for each transform in the repo.
59 changes: 0 additions & 59 deletions .github/workflows/build-library.yml.old

This file was deleted.

3 changes: 3 additions & 0 deletions .github/workflows/runs.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@

echo ${PWD}
find ../../transforms/universal -mindepth 1 -maxdepth 1 -type d
114 changes: 114 additions & 0 deletions .github/workflows/test-code-code2parquet-kfp.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
#
# DO NOT EDIT THIS FILE: it is generated from test-transform.template, Edit there and run make to change these files
#
name: Test KFP - transforms/code/code2parquet

on:
workflow_dispatch:
push:
branches:
- "dev"
- "releases/**"
tags:
- "*"
paths:
- "transforms/code/code2parquet/**"
- "!kfp/**" # This is tested in separate workflow
- "!data-processing-lib/**" # This is tested in separate workflow
- "!**.md"
- "!**/doc/**"
- "!**/images/**"
- "!**.gitignore"
pull_request:
branches:
- "dev"
- "releases/**"
paths:
- "transforms/code/code2parquet/**"
- "!data-processing-lib/**" # This is tested in separate workflow
- "!kfp/**" # This is tested in separate workflow
- "!**.md"
- "!**/doc/**"
- "!**/images/**"
- "!**.gitignore"


jobs:
test-kfp-v1:
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Free up space in github runner
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
run: |
df -h
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/lib/jvm /usr/local/.ghcup
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true
df -h
- name: Test KFP libs (shared and v1) and run a workflow
timeout-minutes: 120
run: |
export REPOROOT=$PWD
export K8S_SETUP_SCRIPTS=$PWD/scripts/k8s-setup
source $K8S_SETUP_SCRIPTS/requirements.env
export PATH=$PATH:/tmp/
curl -Lo /tmp/kind https://kind.sigs.k8s.io/dl/v${KIND_VERSION}/kind-linux-amd64
chmod 777 /tmp/kind
curl -fsSL -o /tmp/get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 /tmp/get_helm.sh
HELM_INSTALL_DIR=/tmp/ /tmp/get_helm.sh -v v${HELM_VERSION} --no-sudo
chmod 777 /tmp/helm
curl -L https://dl.k8s.io/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl -o /tmp/kubectl
chmod 777 /tmp/kubectl
curl https://dl.min.io/client/mc/release/linux-amd64/mc --create-dirs -o /tmp/mc
chmod +x /tmp/mc
export DEPLOY_KUBEFLOW=1
make -C $K8S_SETUP_SCRIPTS setup
make -C kfp/kfp_support_lib test
make -C transforms workflow-build
source $K8S_SETUP_SCRIPTS/common.sh
make -C transforms/code/code2parquet workflow-test
echo "Run transforms/code/code2parquet completed"

test-kfp-v2:
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Free up space in github runner
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
run: |
df -h
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/lib/jvm /usr/local/.ghcup
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true
df -h
- name: Test KFP libs (shared and v2) and run a workflow
timeout-minutes: 120
run: |
export REPOROOT=$PWD
export K8S_SETUP_SCRIPTS=$PWD/scripts/k8s-setup
source $K8S_SETUP_SCRIPTS/requirements.env
export PATH=$PATH:/tmp/
curl -Lo /tmp/kind https://kind.sigs.k8s.io/dl/v${KIND_VERSION}/kind-linux-amd64
chmod 777 /tmp/kind
curl -fsSL -o /tmp/get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 /tmp/get_helm.sh
HELM_INSTALL_DIR=/tmp/ /tmp/get_helm.sh -v v${HELM_VERSION} --no-sudo
chmod 777 /tmp/helm
curl -L https://dl.k8s.io/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl -o /tmp/kubectl
chmod 777 /tmp/kubectl
curl https://dl.min.io/client/mc/release/linux-amd64/mc --create-dirs -o /tmp/mc
chmod +x /tmp/mc
export DEPLOY_KUBEFLOW=1
export KFPv2=1
make -C $K8S_SETUP_SCRIPTS setup
make -C kfp/kfp_support_lib test
make -C transforms workflow-build
source $K8S_SETUP_SCRIPTS/common.sh
make -C transforms/code/code2parquet workflow-test
header_text "Run transforms/code/code2parquet completed"
Loading
Loading