-
Couldn't load subscription status.
- Fork 7
PIM stack for triton inference server to run AI/ML applications #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
13c62da
8ab62ef
9c62ad5
36d8ae8
fa4f572
90b37f5
98964a7
fd6b15a
0d42803
2d1858d
fbaa642
7cc4022
43f6ea1
2ff889a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| FROM quay.io/powercloud/pim:base | ||
|
|
||
| COPY tritonserver_config.sh /usr/bin/ | ||
| COPY tritonserver_config.service /etc/systemd/system | ||
| RUN systemctl unmask tritonserver_config.service | ||
| RUN systemctl enable tritonserver_config.service | ||
|
|
||
| COPY tritonserver.container /usr/share/containers/systemd |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,138 @@ | ||
| # Triton | ||
|
|
||
| Triton inference server can be used to serve machine learning or deep learning models like classification, regression etc on CPU/GPU platforms. | ||
| Triton inference server is built on top of base image [here](../../base-image/) | ||
|
|
||
| ## Build PIM triton server | ||
| **Step 1: Build Base image** | ||
| Follow the steps provided [here](../../base-image/README.md) to build the base image. | ||
|
|
||
| **Step 2: Build triton server PIM image** | ||
| - Bootc PIM based triton server image brings up the AI partition that can serve trained machine learning models for the AI applications. | ||
| Ensure to replace the `FROM` image in [Containerfile](Containerfile) with the base image you have built before building this image. | ||
|
|
||
| ```shell | ||
| podman build -t <your_registry>/pim:triton-server | ||
|
|
||
| podman push <your_registry>/pim:triton-server | ||
| ``` | ||
|
|
||
| ## Steps to setup e2e inference flow | ||
|
|
||
| ### Step 1: Preparing the model and config file | ||
| As mentioned earlier triton inference server can be used to serve any machine learning models with their respective model and configuration files stored in model repository. You can build your model and config file for your use case. | ||
| To show case the e2e flow of triton inference server deployment from PIM, we will be utilising the existing application [fraud-detection](https://github.com/PDeXchange/ai-demos/tree/main/02_Fraud_Detection). Please follow below steps to build the model and config file. | ||
|
|
||
| #### Step I: Building the image | ||
| To easily train the model with the provided python application, we have provided a Containerfile with the necessary packages, environment and tools to run the python application which can train the model for you. The source files for training the python application will be volume mounted during training to reuse the container across AI example applications. | ||
|
|
||
| Build the container image for AI example application covered in [ai-demos](https://github.com/PDeXchange/ai-demos) using [build-steps](app/README.md) | ||
|
|
||
| To consume the already built and hosted container image use `quay.io/powercloud/build_env` | ||
|
|
||
| #### Step II: Train the model | ||
| Model with ONNX runtime can be trained by running the container image built in Step I. Follow the [training steps](app/README.md) | ||
| After the successful training completion, model(mode.onnx) and config(config.pbtxt) files will be available in path **<current_dir>/app/model_repository/fraud** | ||
|
|
||
| ### Step 2: Store model artifacts in a model repository | ||
| Store both model file(model.onnx) and config file(config.pbtxt) in a simple HTTP server | ||
|
|
||
| #### Steps to start http server and copy the model artifacts | ||
| ```shell | ||
| # Install httpd | ||
| yum install httpd -y | ||
| systemctl enable httpd | ||
| systemctl start httpd | ||
| # Copy AI app specific artifacts like model file and model config file | ||
| mkdir -p /var/www/html/fraud_detection/ | ||
| cp <current_dir>/model_repository/fraud_detection/config.pbtxt /var/www/html/fraud_detection/ | ||
| cp <current_dir>/model_repository/fraud_detection/1/model.onnx /var/www/html/fraud_detection/ | ||
| ``` | ||
|
|
||
| ### Step 3: Setting up PIM partition | ||
| Follow this [deployer section](../../README.md#deployer-steps) to setup PIM cli, configuring your AI partition and launching it. | ||
|
|
||
| Regarding configuration of AI application served from triton server, user need to provide generated model artifacts like model file and config file to the PIM partition as shown below in `ai.config-json` section. | ||
| ```ini | ||
| config-json = """ | ||
| { | ||
| "modelSource": "http://<Host/IP>/fraud_detection/model.onnx", | ||
| "configSource": "http://<Host/IP>/fraud_detection/config.pbtxt", | ||
| "aiApp": "fraud_detection" | ||
| } | ||
| ``` | ||
| modelSource and configSource are the URI path to the model artifacts stored on the model repository covered in Step 2. Specify name of the AI application for which model and config files need to be pulled from model repository. | ||
|
|
||
| ### Step 4: Validate AI application functionality | ||
| To verify AI example application served from Triton server, feed the ai.validation section with application specific REST schema like URL, headers and payload. If you have built and trained model for fraud detection usecase, apply below speicifed configurations in [config.ini](../../config.ini). | ||
|
|
||
|
|
||
| ```ini | ||
| [[validation]] | ||
| # yes, no - set yes to make the request to validate the AI app deployed as part of PIM partition | ||
| request = "yes" | ||
| url = "http://<PIM_LPAR_IP>:8000/v1/chat/completions" | ||
| method = "POST" # GET, POST | ||
| # provide headers to use in json format inside triple quotes | ||
| headers = """ | ||
| { | ||
| "Content-Type": "application/json" | ||
| } | ||
| """ | ||
| # provide payload to use in json format inside triple quotes. | ||
| # Below JSON payload is used when fraud-detection example is served from triton server | ||
| payload = """ | ||
| { | ||
| "inputs": [ | ||
| { | ||
| "name": "float_input", | ||
| "shape": [ | ||
| 1, | ||
| 7 | ||
| ], | ||
| "datatype": "FP32", | ||
| "data": [ | ||
| [ | ||
| 20, | ||
| 0.5, | ||
| 2, | ||
| 1.0, | ||
| 1.0, | ||
| 1.0, | ||
| 1.0 | ||
| ] | ||
| ] | ||
| } | ||
| ], | ||
| "outputs": [ | ||
| { | ||
| "name": "label" | ||
| }, | ||
| { | ||
| "name": "probabilities" | ||
| } | ||
| ] | ||
| } | ||
| """ | ||
| ``` | ||
|
|
||
| Once PIM partition is deployed with triton server serving the model of configured AI application(fraud-detection in the example above), you will get to observe the output as below | ||
| ```json | ||
| { | ||
| "model_name":"fraud", | ||
| "model_version":"1", | ||
| "outputs":[ | ||
| { | ||
| "name":"label", | ||
| "datatype":"INT64", | ||
| "shape":[1,1], | ||
| "data":[1] | ||
| }, | ||
| { | ||
| "name":"probabilities", | ||
| "datatype":"FP32", | ||
| "shape":[1,2], | ||
| "data":[4.172325134277344e-7,0.9999995827674866] | ||
| }] | ||
| } | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| # Triton server | ||
|
|
||
| [Triton server](https://github.com/triton-inference-server/server) can be used to inference AI workloads using machine learning models. Some of pre-built example AI workloads like fraud detection, Iris classification etc are covered in the [ai-demos-repo](https://github.com/PDeXchange/ai-demos). Users can utilise them to try out the triton inference server. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please mention somewhere that currently only There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. with recent changes in containerization of ai-demos apps, triton server supports both examples, so this line not needed. |
||
| Users can deploy AI workloads of their choice of model and configuration by supplying the trained model file(model.onnx) and configuration file (config.pbtxt) to http server to be used by Triton server when its run on a PIM partition. | ||
|
|
||
| ## Fraud detection/Iris usecase with ONNX runtime | ||
| ### Pre-requisites | ||
| Below mentioned pre-requisites are needed to build container image for fraud detection example | ||
| - podman | ||
| - container registry to push the built fraud detection container image | ||
| - protobuf | ||
|
|
||
| ### Build application container image | ||
| The [script](build_and_train.sh) builds the base container image for the AI example applications given in [ai-demos](https://github.com/PDeXchange/ai-demos). | ||
| ```shell | ||
| bash build_and_train.sh build | ||
| ``` | ||
|
|
||
| ### Training model with ONNX runtime | ||
| Run the `build_env` base container image built above to train the model and generate model configuration for the AI usecase. Provide both AI application name and the container image built above as arguments to the script. Below command demonstrates the training of fraud detection usecase. | ||
| ```shell | ||
| bash build_and_train.sh train fraud_detection localhost/build_env | ||
| ``` | ||
|
|
||
| After the successful execution, **model.onnx** file will be available on the path `ai-demos/fraud_detection/model_repository/fraud_detection/1/model.onnx`. It also persisits configuration for the model **config.pbtxt** on to the path `ai-demos/fraud_detection/model_repository/fraud_detection/config.pbtxt` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| #!/bin/bash | ||
|
|
||
| AI_DEMOS_REPO="https://github.com/PDeXchange/ai-demos" | ||
| REPO_NAME="ai-demos" | ||
| REGISTRY="localhost" | ||
| CONTAINER_IMAGE="$REGISTRY/build_env" | ||
|
|
||
| show_help() { | ||
| cat << EOF | ||
| Usage: $(basename "$0") [build, train] [options] | ||
|
|
||
| This is a bash script to build the AI application container image and train its machine learning/deep learning model. | ||
|
|
||
| Available commands: | ||
| build Build the container image for AI applications. | ||
| train Train a model by passing the AI application container image as an argument. | ||
| help Display the help message. | ||
|
|
||
| EOF | ||
| } | ||
|
|
||
| build_image() { | ||
| if [ ! -d "$REPO_NAME" ]; then | ||
| echo "Cloning source code from $AI_DEMOS_REPO" | ||
| git clone $AI_DEMOS_REPO | ||
| fi | ||
|
|
||
| cd $REPO_NAME | ||
|
|
||
| echo "Building container image: $CONTAINER_IMAGE" | ||
| podman build . -t $CONTAINER_IMAGE | ||
| } | ||
|
|
||
| train_model() { | ||
| shift | ||
|
|
||
| if [ "$#" -ne 2 ]; then | ||
| echo "Error: 'train' command requires exactly two arguments: application_name and container_image" >&2 | ||
| echo "Usage: $0 train <app_name> <container_image>" >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| local APP="$1" | ||
| local CONTAINER_IMAGE="$2" | ||
|
|
||
| echo "Executing 'train' command..." | ||
| echo " APPLICATION: $APP" | ||
| echo " CONTAINER IMAGE: $CONTAINER_IMAGE" | ||
|
|
||
| if [ ! -d "$REPO_NAME" ]; then | ||
| echo "Cloning source code from $AI_DEMOS_REPO" | ||
| git clone $AI_DEMOS_REPO | ||
| fi | ||
|
|
||
| cd $REPO_NAME | ||
|
|
||
| echo "find the app directory" | ||
| app_dir=$(find . -maxdepth 1 -type d -iname "*$APP*" | head -n 1) | ||
| echo "app dir: $app_dir" | ||
| #if [ -d "$app_dir" ]; then | ||
| # cd "$app_dir" || return | ||
| #fi | ||
|
|
||
| mkdir -p $(pwd)/${app_dir}/model_repository | ||
|
|
||
| echo "Train the model using $CONTAINER_IMAGE container" | ||
| # Run the app image to generate the model file | ||
| podman run --rm --name $APP -v $(pwd)/$app_dir:/app:Z -v $(pwd)/Makefile:/app/Makefile:Z \ | ||
| --entrypoint="/bin/sh" $CONTAINER_IMAGE -c "cd /app && make train APP=$APP" | ||
| echo "Model has been trained successfuly and available at: $(pwd)/model_repository/$APP/1" | ||
|
|
||
| # Cleanup redunduntant volume hosted directories | ||
| rm -rf $(pwd)/$APP/$APP | ||
|
|
||
| echo "Generate model config file for app: $APP" | ||
| make generate-config APP=$APP || { echo "Failed to generate model config.pbtxt file for $APP" >&2; exit 1; } | ||
| echo "Model config file config.pbtxt has been generated for app: $APP" | ||
| } | ||
|
|
||
| # If no subcommands or args passed, display help | ||
| if [ $# -eq 0 ]; then | ||
| show_help | ||
| exit 1 | ||
| fi | ||
|
|
||
| SUBCOMMAND="$1" | ||
| case "$SUBCOMMAND" in | ||
| build) | ||
| build_image "$@" | ||
| ;; | ||
| train) | ||
| train_model "$@" | ||
| ;; | ||
| help) | ||
| show_help | ||
| ;; | ||
| *) | ||
| echo "Error: Unknown command '$SUBCOMMAND'" >&2 | ||
| show_help | ||
| exit 1 | ||
| ;; | ||
| esac |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| [Unit] | ||
| Description=Run tritonserver with ONNX runtime to serve deep learning/machine learning models | ||
| Requires=tritonserver_config.service | ||
| After=tritonserver_config.service | ||
|
|
||
| [Service] | ||
| Restart=on-failure | ||
| RestartSec=60 | ||
| EnvironmentFile=/etc/pim/env.conf | ||
|
|
||
| [Container] | ||
| Image=quay.io/powercloud/tritonserver:latest | ||
| ContainerName=tritonserver | ||
| EnvironmentFile=/etc/pim/tritonserver.conf | ||
| Network=host | ||
| PublishPort=8000-8002:8000-8002 | ||
| Volume=/var/models/model_repository:/models:Z | ||
| Exec=/bin/sh -c 'tritonserver --model-repository=/models --' | ||
| SecurityLabelType=unconfined_t | ||
|
|
||
| [Install] | ||
| WantedBy=multi-user.target default.target |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| [Unit] | ||
| Description=Mount and setup triton server config | ||
| Requires=network-online.target cloud-config.target | ||
| After=network-online.target cloud-config.target | ||
|
|
||
| [Service] | ||
| Type=oneshot | ||
| ExecStart=/usr/bin/env /bin/bash /usr/bin/tritonserver_config.sh | ||
| RemainAfterExit=yes | ||
| TimeoutSec=0 | ||
|
|
||
| StandardOutput=journal+console | ||
|
|
||
| [Install] | ||
| WantedBy=multi-user.target default.target |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| #!/bin/bash | ||
|
|
||
| set -x | ||
|
|
||
| [ -f /etc/pim/tritonserver.conf ] || touch /etc/pim/tritonserver.conf | ||
|
|
||
| AI_APP=$(jq -r '.aiApp' /etc/pim/pim_config.json) | ||
| echo "Application: ${AI_APP}" | ||
|
|
||
| mkdir -p /var/models/model_repository/${AI_APP}/1 | ||
|
|
||
| ONNX_MODEL_SOURCE=$(jq -r '.modelSource' /etc/pim/pim_config.json) | ||
| if [[ -n "$ONNX_MODEL_SOURCE" ]]; then | ||
| curl "$ONNX_MODEL_SOURCE" --output /var/models/model_repository/${AI_APP}/1/model.onnx | ||
| fi | ||
|
|
||
| CONFIG_FILE=$(jq -r '.configSource' /etc/pim/pim_config.json) | ||
| if [[ -n "$CONFIG_FILE" ]]; then | ||
| curl "$CONFIG_FILE" --output /var/models/model_repository/${AI_APP}/config.pbtxt | ||
| fi | ||
|
|
||
| var_to_add=MODEL_REPOSITORY=/var/models/model_repository | ||
| sed -i "/^MODEL_REPOSITORY=.*/d" /etc/pim/tritonserver.conf && echo "$var_to_add" >> /etc/pim/tritonserver.conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line might not be required if you do below comment