|
| 1 | +# Hopsworks Model Serving REST API |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +Hopsworks provides model serving capabilities by leveraging [KServe](https://kserve.github.io/website/) as the model serving platform and [Istio](https://istio.io/) as the ingress gateway to the model deployments. |
| 6 | + |
| 7 | +This document explains how to interact with a model deployment via REST API. |
| 8 | + |
| 9 | +## Base URL |
| 10 | + |
| 11 | +Deployed models are accessible through the Istio ingress gateway. The URL to interact with a model deployment is provided on the model deployment page in the Hopsworks UI. |
| 12 | + |
| 13 | +The URL follows the format `http://<ISTIO_GATEWAY_IP>/<RESOURCE_PATH>`, where `RESOURCE_PATH` depends on the [model server](https://docs.hopsworks.ai/latest/user_guides/mlops/serving/predictor/#model-server) (e.g. vLLM, TensorFlow Serving, SKLearn ModelServer). |
| 14 | + |
| 15 | +<p align="center"> |
| 16 | + <figure> |
| 17 | + <img style="max-width: 100%; margin: 0 auto" src="../../../../assets/images/guides/mlops/serving/deployment_endpoints.png" alt="Endpoints"> |
| 18 | + <figcaption>Deployment Endpoints</figcaption> |
| 19 | + </figure> |
| 20 | +</p> |
| 21 | + |
| 22 | + |
| 23 | +## Authentication |
| 24 | + |
| 25 | +All requests must include an API Key for authentication. You can create an API by following this [guide](../../projects/api_key/create_api_key.md). |
| 26 | + |
| 27 | +Include the key in the Authorization header: |
| 28 | +```text |
| 29 | +Authorization: ApiKey <API_KEY_VALUE> |
| 30 | +``` |
| 31 | + |
| 32 | +## Headers |
| 33 | + |
| 34 | +| Header | Description | Example Value | |
| 35 | +| --------------- | ------------------------------------------- | ------------------------------------ | |
| 36 | +| `Host` | Model’s hostname, provided in Hopsworks UI. | `fraud.test.hopsworks.ai` | |
| 37 | +| `Authorization` | API key for authentication. | `ApiKey <your_api_key>` | |
| 38 | +| `Content-Type` | Request payload type (always JSON). | `application/json` | |
| 39 | + |
| 40 | +## Request Format |
| 41 | + |
| 42 | +The request format depends on the model sever being used. |
| 43 | + |
| 44 | +For predictive inference (i.e. for Tensorflow or SkLearn or Python Serving). The request must be sent as a JSON object containing an `inputs` or `instances` field. You can find more information on the request format [here](https://kserve.github.io/website/docs/concepts/architecture/data-plane/v1-protocol#request-format). An example for this is given below. |
| 45 | + |
| 46 | +=== "Python" |
| 47 | + |
| 48 | + !!! example "REST API example for Predictive Inference (Tensorflow or SkLearn or Python Serving)" |
| 49 | + ```python |
| 50 | + import requests |
| 51 | + |
| 52 | + data = { |
| 53 | + "inputs": [ |
| 54 | + [ |
| 55 | + 4641025220953719, |
| 56 | + 4920355418495856 |
| 57 | + ] |
| 58 | + ] |
| 59 | + } |
| 60 | + |
| 61 | + headers = { |
| 62 | + "Host": "fraud.test.hopsworks.ai", |
| 63 | + "Authorization": "ApiKey 8kDOlnRlJU4kiV1Y.RmFNJY3XKAUSqmJZ03kbUbXKMQSHveSBgMIGT84qrM5qXMjLib7hdlfGeg8fBQZp", |
| 64 | + "Content-Type": "application/json" |
| 65 | + } |
| 66 | + |
| 67 | + response = requests.post( |
| 68 | + "http://10.87.42.108/v1/models/fraud:predict", |
| 69 | + headers=headers, |
| 70 | + json=data |
| 71 | + ) |
| 72 | + print(response.json()) |
| 73 | + ``` |
| 74 | + |
| 75 | +=== "Curl" |
| 76 | + |
| 77 | + !!! example "REST API example for Predictive Inference (Tensorflow or SkLearn or Python Serving)" |
| 78 | + ```bash |
| 79 | + curl -X POST "http://10.87.42.108/v1/models/fraud:predict" \ |
| 80 | + -H "Host: fraud.test.hopsworks.ai" \ |
| 81 | + -H "Authorization: ApiKey 8kDOlnRlJU4kiV1Y.RmFNJY3XKAUSqmJZ03kbUbXKMQSHveSBgMIGT84qrM5qXMjLib7hdlfGeg8fBQZp" \ |
| 82 | + -H "Content-Type: application/json" \ |
| 83 | + -d '{ |
| 84 | + "inputs": [ |
| 85 | + [ |
| 86 | + 4641025220953719, |
| 87 | + 4920355418495856 |
| 88 | + ] |
| 89 | + ] |
| 90 | + }' |
| 91 | + ``` |
| 92 | + |
| 93 | +For generative inference (i.e vLLM) the response follows the [OpenAI specification](https://platform.openai.com/docs/api-reference/chat/create). |
| 94 | + |
| 95 | + |
| 96 | +## Response |
| 97 | + |
| 98 | +The model returns predictions in a JSON object. The response depends on the model server implementation. You can find more information regarding specific model servers in the [Kserve documentation](https://kserve.github.io/website/docs/intro). |
0 commit comments