Skip to content

Commit 147b097

Browse files
manu-sjjavierdlrm
andauthored
[FSTORE-1820] Documentation for REST API model deployments (#504) (#506)
Co-authored-by: Javier de la Rúa Martínez <[email protected]>
1 parent 7de43d2 commit 147b097

File tree

4 files changed

+103
-0
lines changed

4 files changed

+103
-0
lines changed
548 KB
Loading

docs/user_guides/mlops/serving/index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,10 @@ Configure the predictor to batch inference requests, see the [Inference Batcher
2424

2525
Configure the predictor to log inference requests and predictions, see the [Inference Logger Guide](inference-logger.md).
2626

27+
### REST API
28+
29+
Send inference requests to deployed models using REST API, see the [Rest API Guide](rest-api.md).
30+
2731
### Troubleshooting
2832

2933
Inspect the model server logs to troubleshoot your model deployments, see the [Troubleshooting Guide](troubleshooting.md).
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Hopsworks Model Serving REST API
2+
3+
## Introduction
4+
5+
Hopsworks provides model serving capabilities by leveraging [KServe](https://kserve.github.io/website/) as the model serving platform and [Istio](https://istio.io/) as the ingress gateway to the model deployments.
6+
7+
This document explains how to interact with a model deployment via REST API.
8+
9+
## Base URL
10+
11+
Deployed models are accessible through the Istio ingress gateway. The URL to interact with a model deployment is provided on the model deployment page in the Hopsworks UI.
12+
13+
The URL follows the format `http://<ISTIO_GATEWAY_IP>/<RESOURCE_PATH>`, where `RESOURCE_PATH` depends on the [model server](https://docs.hopsworks.ai/latest/user_guides/mlops/serving/predictor/#model-server) (e.g. vLLM, TensorFlow Serving, SKLearn ModelServer).
14+
15+
<p align="center">
16+
<figure>
17+
<img style="max-width: 100%; margin: 0 auto" src="../../../../assets/images/guides/mlops/serving/deployment_endpoints.png" alt="Endpoints">
18+
<figcaption>Deployment Endpoints</figcaption>
19+
</figure>
20+
</p>
21+
22+
23+
## Authentication
24+
25+
All requests must include an API Key for authentication. You can create an API by following this [guide](../../projects/api_key/create_api_key.md).
26+
27+
Include the key in the Authorization header:
28+
```text
29+
Authorization: ApiKey <API_KEY_VALUE>
30+
```
31+
32+
## Headers
33+
34+
| Header | Description | Example Value |
35+
| --------------- | ------------------------------------------- | ------------------------------------ |
36+
| `Host` | Model’s hostname, provided in Hopsworks UI. | `fraud.test.hopsworks.ai` |
37+
| `Authorization` | API key for authentication. | `ApiKey <your_api_key>` |
38+
| `Content-Type` | Request payload type (always JSON). | `application/json` |
39+
40+
## Request Format
41+
42+
The request format depends on the model sever being used.
43+
44+
For predictive inference (i.e. for Tensorflow or SkLearn or Python Serving). The request must be sent as a JSON object containing an `inputs` or `instances` field. You can find more information on the request format [here](https://kserve.github.io/website/docs/concepts/architecture/data-plane/v1-protocol#request-format). An example for this is given below.
45+
46+
=== "Python"
47+
48+
!!! example "REST API example for Predictive Inference (Tensorflow or SkLearn or Python Serving)"
49+
```python
50+
import requests
51+
52+
data = {
53+
"inputs": [
54+
[
55+
4641025220953719,
56+
4920355418495856
57+
]
58+
]
59+
}
60+
61+
headers = {
62+
"Host": "fraud.test.hopsworks.ai",
63+
"Authorization": "ApiKey 8kDOlnRlJU4kiV1Y.RmFNJY3XKAUSqmJZ03kbUbXKMQSHveSBgMIGT84qrM5qXMjLib7hdlfGeg8fBQZp",
64+
"Content-Type": "application/json"
65+
}
66+
67+
response = requests.post(
68+
"http://10.87.42.108/v1/models/fraud:predict",
69+
headers=headers,
70+
json=data
71+
)
72+
print(response.json())
73+
```
74+
75+
=== "Curl"
76+
77+
!!! example "REST API example for Predictive Inference (Tensorflow or SkLearn or Python Serving)"
78+
```bash
79+
curl -X POST "http://10.87.42.108/v1/models/fraud:predict" \
80+
-H "Host: fraud.test.hopsworks.ai" \
81+
-H "Authorization: ApiKey 8kDOlnRlJU4kiV1Y.RmFNJY3XKAUSqmJZ03kbUbXKMQSHveSBgMIGT84qrM5qXMjLib7hdlfGeg8fBQZp" \
82+
-H "Content-Type: application/json" \
83+
-d '{
84+
"inputs": [
85+
[
86+
4641025220953719,
87+
4920355418495856
88+
]
89+
]
90+
}'
91+
```
92+
93+
For generative inference (i.e vLLM) the response follows the [OpenAI specification](https://platform.openai.com/docs/api-reference/chat/create).
94+
95+
96+
## Response
97+
98+
The model returns predictions in a JSON object. The response depends on the model server implementation. You can find more information regarding specific model servers in the [Kserve documentation](https://kserve.github.io/website/docs/intro).

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,6 +203,7 @@ nav:
203203
- Inference Logger: user_guides/mlops/serving/inference-logger.md
204204
- Inference Batcher: user_guides/mlops/serving/inference-batcher.md
205205
- API Protocol: user_guides/mlops/serving/api-protocol.md
206+
- REST API: user_guides/mlops/serving/rest-api.md
206207
- Troubleshooting: user_guides/mlops/serving/troubleshooting.md
207208
- External Access: user_guides/mlops/serving/external-access.md
208209
- Vector Database: user_guides/mlops/vector_database/index.md

0 commit comments

Comments
 (0)