Skip to content

Commit

Permalink
Rework Image Building Process (#547)
Browse files Browse the repository at this point in the history
<!--  Thanks for sending a pull request!  Here are some tips for you:

1. Run unit tests and ensure that they are passing
2. If your change introduces any API changes, make sure to update the
e2e tests
3. Make sure documentation is updated for your PR!

-->
# Description

This PR reworks how Merlin simplifies the model image building by
merging the user and Merlin dependencies so the Conda environment
installation (creation) only happens once.

# Modifications

1. Publish merlin-pyfunc-server to PyPI
2. Add merlin-pyfunc-server onto the user's conda.yaml via merlin-sdk
and pyfunc-server/docker/base.Dockerfile
3. Refactor pyfunc-server/docker/base.Dockerfile and
pyfunc-server/docker/Dockerfile

# Tests
<!-- Besides the existing / updated automated tests, what specific
scenarios should be tested? Consider the backward compatibility of the
changes, whether corner cases are covered, etc. Please describe the
tests and check the ones that have been completed. Eg:
- [x] Deploying new and existing standard models
- [ ] Deploying PyFunc models
-->

- [x] Processing conda environment from merlin sdk 
- [x] Deploy pyfunc server model to gojek dev environment
- [x] Create prediction job to gojek dev environment

# Checklist
- [x] Added PR label
- [x] Added unit test, integration, and/or e2e tests
- [x] Tested locally
- [ ] Updated documentation
- [ ] Update Swagger spec if the PR introduce API changes
- [ ] Regenerated Golang and Python client if the PR introduces API
changes

# Release Notes
<!--
Does this PR introduce a user-facing change?
If no, just write "NONE" in the release-note block below.
If yes, a release note is required. Enter your extended release note in
the block below.
If the PR requires additional action from users switching to the new
release, include the string "action required".

For more information about release notes, see kubernetes' guide here:
http://git.k8s.io/community/contributors/guide/release-notes.md
-->

```release-note

``` 

# Future Improvements

These tasks will be picked up on separate PRs:
- [ ] Update README.md for batch-predictor and pyfunc-server
- [ ] Update run pyfunc server locally
  • Loading branch information
ariefrahmansyah authored Mar 20, 2024
1 parent 308a0d6 commit 06f121c
Show file tree
Hide file tree
Showing 72 changed files with 1,257 additions and 445 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/external.yml
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
name: External Dependencies CI Workflow
on:
push:
branches:
- main
branches:
- main
pull_request:

jobs:
publish-mlflow-docker:
runs-on: ubuntu-latest
permissions:
packages: write
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Build and push MLflow Docker image
uses: docker/build-push-action@v1
with:
Expand Down
63 changes: 39 additions & 24 deletions .github/workflows/merlin.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
outputs:
version: ${{ steps.create_version.outputs.version }}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
with:
fetch-depth: 0
- id: create_version
Expand Down Expand Up @@ -47,8 +47,8 @@ jobs:
env:
PIPENV_DEFAULT_PYTHON_VERSION: ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v4
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- uses: actions/cache@v3
Expand Down Expand Up @@ -79,8 +79,8 @@ jobs:
env:
PIPENV_DEFAULT_PYTHON_VERSION: ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v4
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- uses: actions/cache@v3
Expand Down Expand Up @@ -111,8 +111,8 @@ jobs:
env:
PIPENV_DEFAULT_PYTHON_VERSION: ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v4
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- uses: actions/cache@v3
Expand Down Expand Up @@ -162,7 +162,7 @@ jobs:
ports:
- 5432:5432
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- uses: actions/setup-go@v2
with:
go-version: ${{ env.GO_VERSION }}
Expand All @@ -186,15 +186,14 @@ jobs:
POSTGRES_PASSWORD: ${{ secrets.DB_PASSWORD }}
run: make it-test-api-ci


test-observation-publisher:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v4
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
id: setup-python
with:
python-version: '3.10'
python-version: "3.10"
- uses: actions/cache@v3
with:
path: ~/.cache/pip
Expand All @@ -213,7 +212,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout to the target branch
uses: actions/checkout@v2
uses: actions/checkout@v4
- uses: actions/setup-node@v2
with:
node-version: 16
Expand Down Expand Up @@ -266,7 +265,7 @@ jobs:
- build-ui
- test-api
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Download UI Dist
uses: actions/download-artifact@v2
with:
Expand All @@ -283,17 +282,25 @@ jobs:
path: merlin.${{ needs.create-version.outputs.version }}.tar
retention-days: ${{ env.ARTIFACT_RETENTION_DAYS }}

build-batch-predictor-base:
build-batch-predictor-base-image:
runs-on: ubuntu-latest
needs:
- create-version
- test-batch-predictor
env:
DOCKER_IMAGE_TAG: "ghcr.io/${{ github.repository }}/merlin-pyspark-base:${{ needs.create-version.outputs.version }}"
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Build Batch Predictor Base Docker
run: docker build -t ${{ env.DOCKER_IMAGE_TAG }} -f python/batch-predictor/docker/base.Dockerfile python
- name: Test Build Batch Predictor Docker
run: |
docker build -t "test-batch-predictor:1" \
-f python/batch-predictor/docker/local-app.Dockerfile \
--build-arg BASE_IMAGE=${{ env.DOCKER_IMAGE_TAG }} \
--build-arg MODEL_DEPENDENCIES_URL=batch-predictor/test-model/conda.yaml \
--build-arg MODEL_ARTIFACTS_URL=batch-predictor/test-model \
python
- name: Save Batch Predictor Base Docker
run: docker image save --output merlin-pyspark-base.${{ needs.create-version.outputs.version }}.tar ${{ env.DOCKER_IMAGE_TAG }}
- name: Publish Batch Predictor Base Docker Artifact
Expand All @@ -303,7 +310,7 @@ jobs:
path: merlin-pyspark-base.${{ needs.create-version.outputs.version }}.tar
retention-days: ${{ env.ARTIFACT_RETENTION_DAYS }}

build-pyfunc-server-base:
build-pyfunc-server-base-image:
runs-on: ubuntu-latest
needs:
- create-version
Expand All @@ -312,9 +319,17 @@ jobs:
DOCKER_REGISTRY: ghcr.io
DOCKER_IMAGE_TAG: "ghcr.io/${{ github.repository }}/merlin-pyfunc-base:${{ needs.create-version.outputs.version }}"
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Build Pyfunc Server Base Docker
run: docker build -t ${{ env.DOCKER_IMAGE_TAG }} -f python/pyfunc-server/docker/base.Dockerfile python
- name: Test Build Pyfunc Server Docker
run: |
docker build -t "test-pyfunc-server:1" \
-f python/pyfunc-server/docker/local.Dockerfile \
--build-arg BASE_IMAGE=${{ env.DOCKER_IMAGE_TAG }} \
--build-arg MODEL_DEPENDENCIES_URL=pyfunc-server/test/local-artifacts/conda.yaml \
--build-arg MODEL_ARTIFACTS_URL=pyfunc-server/test/local-artifacts \
python
- name: Save Pyfunc Server Base Docker
run: docker image save --output merlin-pyfunc-base.${{ needs.create-version.outputs.version }}.tar ${{ env.DOCKER_IMAGE_TAG }}
- name: Publish Pyfunc Server Base Docker Artifact
Expand All @@ -329,7 +344,7 @@ jobs:
needs:
- create-version
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- uses: actions/setup-go@v2
with:
go-version: ${{ env.GO_VERSION }}
Expand Down Expand Up @@ -387,7 +402,7 @@ jobs:
DOCKER_REGISTRY: ghcr.io
DOCKER_IMAGE_TAG: "ghcr.io/${{ github.repository }}/merlin-observation-publisher:${{ needs.create-version.outputs.version }}"
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Build Observation Publisher Docker
env:
OBSERVATION_PUBLISHER_IMAGE_TAG: ${{ env.DOCKER_IMAGE_TAG }}
Expand Down Expand Up @@ -417,10 +432,10 @@ jobs:
E2E_PYTHON_VERSION: "3.10.6"
K3S_VERSION: v1.26.7-k3s1
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
with:
path: merlin
- uses: actions/setup-python@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ env.E2E_PYTHON_VERSION }}
- uses: actions/cache@v3
Expand Down Expand Up @@ -502,8 +517,8 @@ jobs:
needs:
- create-version
- build-api
- build-batch-predictor-base
- build-pyfunc-server-base
- build-batch-predictor-base-image
- build-pyfunc-server-base-image
- build-observation-publisher
- test-python-sdk
- e2e-test
Expand Down
65 changes: 57 additions & 8 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,14 @@ jobs:
if: ${{ startsWith(github.ref, 'refs/tags/') }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: 3.8
cache: pip
- name: Install dependencies
working-directory: ./python/sdk
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine
run: pip install setuptools>=64 setuptools_scm>=8 twine wheel
- name: Build and publish
env:
TWINE_USERNAME: ${{ secrets.pypi_username }}
Expand All @@ -41,6 +40,56 @@ jobs:
python setup.py sdist bdist_wheel
twine upload dist/*
publish-merlin-pyfunc-server:
if: ${{ startsWith(github.ref, 'refs/tags/') }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: 3.8
cache: pip
- name: Install dependencies
working-directory: ./python/pyfunc-server
run: pip install setuptools>=64 setuptools_scm>=8 twine wheel
- name: Build and publish
env:
TWINE_USERNAME: ${{ secrets.pypi_username }}
TWINE_PASSWORD: ${{ secrets.pypi_password }}
working-directory: ./python/pyfunc-server
run: |
tag=$(python -m setuptools_scm -r ../.. | sed 's/\+.*//')
sed -i -e "s|VERSION = \".*\"|VERSION = \"`echo "${tag}"`\"|g" ./pyfuncserver/version.py
python setup.py sdist bdist_wheel
twine upload dist/*
publish-merlin-batch-predictor:
if: ${{ startsWith(github.ref, 'refs/tags/') }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: 3.8
cache: pip
- name: Install dependencies
working-directory: ./python/batch-predictor
run: pip install setuptools>=64 setuptools_scm>=8 twine wheel
- name: Build and publish
env:
TWINE_USERNAME: ${{ secrets.pypi_username }}
TWINE_PASSWORD: ${{ secrets.pypi_password }}
working-directory: ./python/batch-predictor
run: |
tag=$(python -m setuptools_scm -r ../.. | sed 's/\+.*//')
sed -i -e "s|VERSION = \".*\"|VERSION = \"`echo "${tag}"`\"|g" ./merlinpyspark/version.py
python setup.py sdist bdist_wheel
twine upload dist/*
publish-python-sdk-docker:
runs-on: ubuntu-latest
strategy:
Expand All @@ -50,7 +99,7 @@ jobs:
needs:
- publish-python-sdk
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Log in to the Container registry
uses: docker/login-action@v1
with:
Expand Down Expand Up @@ -117,7 +166,7 @@ jobs:
docker tag merlin-logger:${{ inputs.version }} ${IMAGE_TAG}
docker push ${IMAGE_TAG}
publish-batch-predictor:
publish-batch-predictor-base-image:
runs-on: ubuntu-latest
env:
DOCKER_IMAGE_TAG: "ghcr.io/${{ github.repository }}/merlin-pyspark-base:${{ inputs.version }}"
Expand All @@ -137,7 +186,7 @@ jobs:
docker image load --input merlin-pyspark-base.${{ inputs.version }}.tar
docker push ${{ env.DOCKER_IMAGE_TAG }}
publish-pyfunc:
publish-pyfunc-server-base-image:
runs-on: ubuntu-latest
env:
DOCKER_IMAGE_TAG: "ghcr.io/${{ github.repository }}/merlin-pyfunc-base:${{ inputs.version }}"
Expand Down
3 changes: 1 addition & 2 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,12 @@ ImageBuilderConfig:
DockerfilePath: docker/Dockerfile
BuildContextURI: git://github.com/gojek/merlin.git#refs/tags/v0.37.0
BuildContextSubPath: python
MainAppPath: /merlin-spark-app/main.py
PredictionJobBaseImage:
ImageName: ghcr.io/caraml-dev/merlin-pyspark-base:v0.37.0
DockerfilePath: docker/app.Dockerfile
BuildContextURI: git://github.com/gojek/merlin.git#refs/tags/v0.37.0
BuildContextSubPath: python
MainAppPath: /merlin-spark-app/main.py
MainAppPath: /home/spark/main.py
BuildNamespace: mlp
DockerRegistry: ghcr.io/caraml-dev
BuildTimeout: 10m
Expand Down
4 changes: 2 additions & 2 deletions python/batch-predictor/Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]
merlin-pyspark-app = {editable = true,extras = ["test"],path = "."}
merlin-batch-predictor = {editable = true,extras = ["test"],path = "."}

[packages]
merlin-pyspark-app = {editable = true, extras = ["test"], path = "."}
merlin-batch-predictor = {editable = true, extras = ["test"], path = "."}
4 changes: 2 additions & 2 deletions python/batch-predictor/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Merlin PySpark App
# Merlin Batch Predictor

Pyspark application for running batch prediction job in merlin.
Merlin Batch Predictor is a PySpark application for running batch prediction job in Merlin system.

## Usage

Expand Down
17 changes: 5 additions & 12 deletions python/batch-predictor/docker/app.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,14 @@ RUN if [ ! -z "${GOOGLE_APPLICATION_CREDENTIALS}" ]; \
# Download and install user model dependencies
ARG MODEL_DEPENDENCIES_URL
RUN gsutil cp ${MODEL_DEPENDENCIES_URL} conda.yaml
RUN conda env create --name merlin-model --file conda.yaml

# Copy and install batch predictor dependencies
COPY --chown=${UID}:${GID} batch-predictor ${HOME}/merlin-spark-app
COPY --chown=${UID}:${GID} sdk ${HOME}/sdk
ENV SDK_PATH=${HOME}/sdk

RUN /bin/bash -c ". activate merlin-model && pip uninstall -y merlin-sdk && pip install -r ${HOME}/merlin-spark-app/requirements.txt"
ARG MERLIN_DEP_CONSTRAINT
RUN process_conda_env.sh conda.yaml "merlin-batch-predictor" "${MERLIN_DEP_CONSTRAINT}"
RUN conda env create --name merlin-model --file conda.yaml

# Download and dry-run user model artifacts and code
ARG MODEL_ARTIFACTS_URL
RUN gsutil -m cp -r ${MODEL_ARTIFACTS_URL} .
RUN /bin/bash -c ". activate merlin-model && python ${HOME}/merlin-spark-app/main.py --dry-run-model ${HOME}/model"

# Copy batch predictor application entrypoint to protected directory
COPY batch-predictor/merlin-entrypoint.sh /opt/merlin-entrypoint.sh
RUN /bin/bash -c ". activate merlin-model && merlin-batch-predictor --dry-run-model ${HOME}/model"

ENTRYPOINT [ "/opt/merlin-entrypoint.sh" ]
ENTRYPOINT ["merlin_entrypoint.sh"]
Loading

0 comments on commit 06f121c

Please sign in to comment.