Skip to content

Commit 22b6494

Browse files
authored
[Frontend][last/5] Make pooling entrypoints request schema consensus. (vllm-project#31127)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
1 parent 7c233db commit 22b6494

24 files changed

Lines changed: 658 additions & 612 deletions

.buildkite/test-amd.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -514,7 +514,7 @@ steps:
514514
- python3 offline_inference/vision_language_multi_image.py --seed 0
515515
- python3 offline_inference/encoder_decoder_multimodal.py --model-type whisper --seed 0
516516
# for pooling models
517-
- python3 pooling/pooling/vision_language_pooling.py --seed 0
517+
- python3 pooling/embed/vision_embedding_offline.py --seed 0
518518
# for features demo
519519
- python3 offline_inference/prefix_caching.py
520520
- python3 offline_inference/llm_engine_example.py

.buildkite/test-pipeline.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -453,7 +453,7 @@ steps:
453453
- python3 offline_inference/vision_language_multi_image.py --seed 0
454454
- python3 offline_inference/encoder_decoder_multimodal.py --model-type whisper --seed 0
455455
# for pooling models
456-
- python3 pooling/pooling/vision_language_pooling.py --seed 0
456+
- python3 pooling/embed/vision_embedding_offline.py --seed 0
457457
# for features demo
458458
- python3 offline_inference/prefix_caching.py
459459
- python3 offline_inference/llm_engine_example.py

.buildkite/test_areas/misc.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ steps:
7272
- python3 offline_inference/vision_language_multi_image.py --seed 0
7373
- python3 offline_inference/encoder_decoder_multimodal.py --model-type whisper --seed 0
7474
# for pooling models
75-
- python3 pooling/pooling/vision_language_pooling.py --seed 0
75+
- python3 pooling/embed/vision_embedding_offline.py --seed 0
7676
# for features demo
7777
- python3 offline_inference/prefix_caching.py
7878
- python3 offline_inference/llm_engine_example.py

docs/features/multimodal_inputs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -510,7 +510,7 @@ Our OpenAI-compatible server accepts multi-modal data via the [Chat Completions
510510
If no fallback is available, an error is raised and you have to provide the chat template manually via the `--chat-template` argument.
511511

512512
For certain models, we provide alternative chat templates inside [examples](../../examples).
513-
For example, VLM2Vec uses [examples/template_vlm2vec_phi3v.jinja](../../examples/template_vlm2vec_phi3v.jinja) which is different from the default one for Phi-3-Vision.
513+
For example, VLM2Vec uses [examples/pooling/embed/template/vlm2vec_phi3v.jinja](../../examples/pooling/embed/template/vlm2vec_phi3v.jinja) which is different from the default one for Phi-3-Vision.
514514

515515
### Image Inputs
516516

docs/serving/openai_compatible_server.md

Lines changed: 31 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -311,15 +311,15 @@ and passing a list of `messages` in the request. Refer to the examples below for
311311
vllm serve TIGER-Lab/VLM2Vec-Full --runner pooling \
312312
--trust-remote-code \
313313
--max-model-len 4096 \
314-
--chat-template examples/template_vlm2vec_phi3v.jinja
314+
--chat-template examples/pooling/embed/template/vlm2vec_phi3v.jinja
315315
```
316316

317317
!!! important
318318
Since VLM2Vec has the same model architecture as Phi-3.5-Vision, we have to explicitly pass `--runner pooling`
319319
to run this model in embedding mode instead of text generation mode.
320320

321321
The custom chat template is completely different from the original one for this model,
322-
and can be found here: [examples/template_vlm2vec_phi3v.jinja](../../examples/template_vlm2vec_phi3v.jinja)
322+
and can be found here: [examples/pooling/embed/template/vlm2vec_phi3v.jinja](../../examples/pooling/embed/template/vlm2vec_phi3v.jinja)
323323

324324
Since the request schema is not defined by OpenAI client, we post a request to the server using the lower-level `requests` library:
325325

@@ -359,14 +359,14 @@ and passing a list of `messages` in the request. Refer to the examples below for
359359
vllm serve MrLight/dse-qwen2-2b-mrl-v1 --runner pooling \
360360
--trust-remote-code \
361361
--max-model-len 8192 \
362-
--chat-template examples/template_dse_qwen2_vl.jinja
362+
--chat-template examples/pooling/embed/template/dse_qwen2_vl.jinja
363363
```
364364

365365
!!! important
366366
Like with VLM2Vec, we have to explicitly pass `--runner pooling`.
367367

368368
Additionally, `MrLight/dse-qwen2-2b-mrl-v1` requires an EOS token for embeddings, which is handled
369-
by a custom chat template: [examples/template_dse_qwen2_vl.jinja](../../examples/template_dse_qwen2_vl.jinja)
369+
by a custom chat template: [examples/pooling/embed/template/dse_qwen2_vl.jinja](../../examples/pooling/embed/template/dse_qwen2_vl.jinja)
370370

371371
!!! important
372372
`MrLight/dse-qwen2-2b-mrl-v1` requires a placeholder image of the minimum image size for text query embeddings. See the full code
@@ -532,15 +532,15 @@ The following [sampling parameters](../api/README.md#inference-parameters) are s
532532
??? code
533533

534534
```python
535-
--8<-- "vllm/entrypoints/openai/protocol.py:transcription-sampling-params"
535+
--8<-- "vllm/entrypoints/openai/speech_to_text/protocol.py:transcription-sampling-params"
536536
```
537537

538538
The following extra parameters are supported:
539539

540540
??? code
541541

542542
```python
543-
--8<-- "vllm/entrypoints/openai/protocol.py:transcription-extra-params"
543+
--8<-- "vllm/entrypoints/openai/speech_to_text/protocol.py:transcription-extra-params"
544544
```
545545

546546
### Translations API
@@ -560,13 +560,13 @@ Code example: [examples/online_serving/openai_translation_client.py](../../examp
560560
The following [sampling parameters](../api/README.md#inference-parameters) are supported.
561561

562562
```python
563-
--8<-- "vllm/entrypoints/openai/protocol.py:translation-sampling-params"
563+
--8<-- "vllm/entrypoints/openai/speech_to_text/protocol.py:translation-sampling-params"
564564
```
565565

566566
The following extra parameters are supported:
567567

568568
```python
569-
--8<-- "vllm/entrypoints/openai/protocol.py:translation-extra-params"
569+
--8<-- "vllm/entrypoints/openai/speech_to_text/protocol.py:translation-extra-params"
570570
```
571571

572572
### Realtime API
@@ -954,28 +954,34 @@ You can pass multi-modal inputs to scoring models by passing `content` including
954954

955955
```python
956956
import requests
957-
957+
958958
response = requests.post(
959959
"http://localhost:8000/v1/score",
960960
json={
961961
"model": "jinaai/jina-reranker-m0",
962962
"queries": "slm markdown",
963-
"documents": {
964-
"content": [
965-
{
966-
"type": "image_url",
967-
"image_url": {
968-
"url": "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/handelsblatt-preview.png"
969-
},
970-
},
971-
{
972-
"type": "image_url",
973-
"image_url": {
974-
"url": "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/paper-11.png"
975-
},
976-
},
977-
],
978-
},
963+
"documents": [
964+
{
965+
"content": [
966+
{
967+
"type": "image_url",
968+
"image_url": {
969+
"url": "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/handelsblatt-preview.png"
970+
},
971+
}
972+
],
973+
},
974+
{
975+
"content": [
976+
{
977+
"type": "image_url",
978+
"image_url": {
979+
"url": "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/handelsblatt-preview.png"
980+
},
981+
}
982+
]
983+
},
984+
],
979985
},
980986
)
981987
response.raise_for_status()
@@ -1001,15 +1007,13 @@ The following Score API parameters are supported:
10011007

10021008
```python
10031009
--8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-params"
1004-
--8<-- "vllm/entrypoints/pooling/score/protocol.py:score-extra-params"
10051010
```
10061011

10071012
The following extra parameters are supported:
10081013

10091014
```python
10101015
--8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-extra-params"
10111016
--8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-extra-params"
1012-
--8<-- "vllm/entrypoints/pooling/score/protocol.py:score-extra-params"
10131017
```
10141018

10151019
### Re-rank API
@@ -1092,15 +1096,13 @@ The following Re-rank API parameters are supported:
10921096
```python
10931097
--8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-params"
10941098
--8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-extra-params"
1095-
--8<-- "vllm/entrypoints/pooling/score/protocol.py:score-extra-params"
10961099
```
10971100

10981101
The following extra parameters are supported:
10991102

11001103
```python
11011104
--8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-extra-params"
11021105
--8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-extra-params"
1103-
--8<-- "vllm/entrypoints/pooling/score/protocol.py:rerank-extra-params"
11041106
```
11051107

11061108
## Ray Serve LLM
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# SPDX-License-Identifier: Apache-2.0
2+
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
3+
# ruff: noqa: E501
4+
"""Example Python client for multimodal classification API using vLLM API server
5+
NOTE:
6+
start a supported multimodal classification model server with `vllm serve`, e.g.
7+
vllm serve muziyongshixin/Qwen2.5-VL-7B-for-VideoCls \
8+
--runner pooling \
9+
--max-model-len 5000 \
10+
--limit-mm-per-prompt '{"video": 1}' \
11+
--hf-overrides '{"text_config": {"architectures": ["Qwen2_5_VLForSequenceClassification"]}}'
12+
"""
13+
14+
import argparse
15+
import pprint
16+
17+
import requests
18+
19+
from vllm.multimodal.utils import encode_image_url, fetch_image
20+
21+
input_text = "This product was excellent and exceeded my expectations"
22+
image_url = "https://vllm-public-assets.s3.us-west-2.amazonaws.com/multimodal_asset/cat_snow.jpg"
23+
image_base64 = {"url": encode_image_url(fetch_image(image_url))}
24+
video_url = "https://www.bogotobogo.com/python/OpenCV_Python/images/mean_shift_tracking/slow_traffic_small.mp4"
25+
26+
27+
def parse_args():
28+
parse = argparse.ArgumentParser()
29+
parse.add_argument("--host", type=str, default="localhost")
30+
parse.add_argument("--port", type=int, default=8000)
31+
return parse.parse_args()
32+
33+
34+
def main(args):
35+
base_url = f"http://{args.host}:{args.port}"
36+
models_url = base_url + "/v1/models"
37+
classify_url = base_url + "/classify"
38+
39+
response = requests.get(models_url)
40+
model_name = response.json()["data"][0]["id"]
41+
42+
print("Text classification output:")
43+
messages = [
44+
{
45+
"role": "assistant",
46+
"content": "Please classify this text request.",
47+
},
48+
{
49+
"role": "user",
50+
"content": input_text,
51+
},
52+
]
53+
response = requests.post(
54+
classify_url,
55+
json={"model": model_name, "messages": messages},
56+
)
57+
pprint.pprint(response.json())
58+
59+
print("Image url classification output:")
60+
messages = [
61+
{
62+
"role": "user",
63+
"content": [
64+
{"type": "text", "text": "Please classify this image."},
65+
{"type": "image_url", "image_url": {"url": image_url}},
66+
],
67+
}
68+
]
69+
response = requests.post(
70+
classify_url,
71+
json={"model": model_name, "messages": messages},
72+
)
73+
pprint.pprint(response.json())
74+
75+
print("Image base64 classification output:")
76+
messages = [
77+
{
78+
"role": "user",
79+
"content": [
80+
{"type": "text", "text": "Please classify this image."},
81+
{"type": "image_url", "image_url": image_base64},
82+
],
83+
}
84+
]
85+
response = requests.post(
86+
classify_url,
87+
json={"model": model_name, "messages": messages},
88+
)
89+
pprint.pprint(response.json())
90+
91+
print("Video url classification output:")
92+
messages = [
93+
{
94+
"role": "user",
95+
"content": [
96+
{"type": "text", "text": "Please classify this video."},
97+
{"type": "video_url", "video_url": {"url": video_url}},
98+
],
99+
}
100+
]
101+
response = requests.post(
102+
classify_url,
103+
json={"model": model_name, "messages": messages},
104+
)
105+
pprint.pprint(response.json())
106+
107+
108+
if __name__ == "__main__":
109+
args = parse_args()
110+
main(args)
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)