@@ -311,15 +311,15 @@ and passing a list of `messages` in the request. Refer to the examples below for
311311 vllm serve TIGER-Lab/VLM2Vec-Full --runner pooling \
312312 --trust-remote-code \
313313 --max-model-len 4096 \
314- --chat-template examples/template_vlm2vec_phi3v .jinja
314+ --chat-template examples/pooling/embed/template/vlm2vec_phi3v .jinja
315315 ```
316316
317317 !!! important
318318 Since VLM2Vec has the same model architecture as Phi-3.5-Vision, we have to explicitly pass `--runner pooling`
319319 to run this model in embedding mode instead of text generation mode.
320320
321321 The custom chat template is completely different from the original one for this model,
322- and can be found here: [examples/template_vlm2vec_phi3v .jinja](../../examples/template_vlm2vec_phi3v .jinja)
322+ and can be found here: [examples/pooling/embed/template/vlm2vec_phi3v .jinja](../../examples/pooling/embed/template/vlm2vec_phi3v .jinja)
323323
324324 Since the request schema is not defined by OpenAI client, we post a request to the server using the lower-level `requests` library:
325325
@@ -359,14 +359,14 @@ and passing a list of `messages` in the request. Refer to the examples below for
359359 vllm serve MrLight/dse-qwen2-2b-mrl-v1 --runner pooling \
360360 --trust-remote-code \
361361 --max-model-len 8192 \
362- --chat-template examples/template_dse_qwen2_vl .jinja
362+ --chat-template examples/pooling/embed/template/dse_qwen2_vl .jinja
363363 ```
364364
365365 !!! important
366366 Like with VLM2Vec, we have to explicitly pass `--runner pooling`.
367367
368368 Additionally, `MrLight/dse-qwen2-2b-mrl-v1` requires an EOS token for embeddings, which is handled
369- by a custom chat template: [examples/template_dse_qwen2_vl .jinja](../../examples/template_dse_qwen2_vl .jinja)
369+ by a custom chat template: [examples/pooling/embed/template/dse_qwen2_vl .jinja](../../examples/pooling/embed/template/dse_qwen2_vl .jinja)
370370
371371 !!! important
372372 `MrLight/dse-qwen2-2b-mrl-v1` requires a placeholder image of the minimum image size for text query embeddings. See the full code
@@ -532,15 +532,15 @@ The following [sampling parameters](../api/README.md#inference-parameters) are s
532532??? code
533533
534534 ```python
535- --8<-- "vllm/entrypoints/openai/protocol.py:transcription-sampling-params"
535+ --8<-- "vllm/entrypoints/openai/speech_to_text/ protocol.py:transcription-sampling-params"
536536 ```
537537
538538The following extra parameters are supported:
539539
540540??? code
541541
542542 ```python
543- --8<-- "vllm/entrypoints/openai/protocol.py:transcription-extra-params"
543+ --8<-- "vllm/entrypoints/openai/speech_to_text/ protocol.py:transcription-extra-params"
544544 ```
545545
546546### Translations API
@@ -560,13 +560,13 @@ Code example: [examples/online_serving/openai_translation_client.py](../../examp
560560The following [ sampling parameters] ( ../api/README.md#inference-parameters ) are supported.
561561
562562``` python
563- -- 8 < -- " vllm/entrypoints/openai/protocol.py:translation-sampling-params"
563+ -- 8 < -- " vllm/entrypoints/openai/speech_to_text/ protocol.py:translation-sampling-params"
564564```
565565
566566The following extra parameters are supported:
567567
568568``` python
569- -- 8 < -- " vllm/entrypoints/openai/protocol.py:translation-extra-params"
569+ -- 8 < -- " vllm/entrypoints/openai/speech_to_text/ protocol.py:translation-extra-params"
570570```
571571
572572### Realtime API
@@ -954,28 +954,34 @@ You can pass multi-modal inputs to scoring models by passing `content` including
954954
955955 ```python
956956 import requests
957-
957+
958958 response = requests.post(
959959 "http://localhost:8000/v1/score",
960960 json={
961961 "model": "jinaai/jina-reranker-m0",
962962 "queries": "slm markdown",
963- "documents": {
964- "content": [
965- {
966- "type": "image_url",
967- "image_url": {
968- "url": "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/handelsblatt-preview.png"
969- },
970- },
971- {
972- "type": "image_url",
973- "image_url": {
974- "url": "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/paper-11.png"
975- },
976- },
977- ],
978- },
963+ "documents": [
964+ {
965+ "content": [
966+ {
967+ "type": "image_url",
968+ "image_url": {
969+ "url": "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/handelsblatt-preview.png"
970+ },
971+ }
972+ ],
973+ },
974+ {
975+ "content": [
976+ {
977+ "type": "image_url",
978+ "image_url": {
979+ "url": "https://raw.githubusercontent.com/jina-ai/multimodal-reranker-test/main/handelsblatt-preview.png"
980+ },
981+ }
982+ ]
983+ },
984+ ],
979985 },
980986 )
981987 response.raise_for_status()
@@ -1001,15 +1007,13 @@ The following Score API parameters are supported:
10011007
10021008``` python
10031009-- 8 < -- " vllm/entrypoints/pooling/base/protocol.py:pooling-common-params"
1004- -- 8 < -- " vllm/entrypoints/pooling/score/protocol.py:score-extra-params"
10051010```
10061011
10071012The following extra parameters are supported:
10081013
10091014``` python
10101015-- 8 < -- " vllm/entrypoints/pooling/base/protocol.py:pooling-common-extra-params"
10111016-- 8 < -- " vllm/entrypoints/pooling/base/protocol.py:classify-extra-params"
1012- -- 8 < -- " vllm/entrypoints/pooling/score/protocol.py:score-extra-params"
10131017```
10141018
10151019### Re-rank API
@@ -1092,15 +1096,13 @@ The following Re-rank API parameters are supported:
10921096``` python
10931097-- 8 < -- " vllm/entrypoints/pooling/base/protocol.py:pooling-common-params"
10941098-- 8 < -- " vllm/entrypoints/pooling/base/protocol.py:classify-extra-params"
1095- -- 8 < -- " vllm/entrypoints/pooling/score/protocol.py:score-extra-params"
10961099```
10971100
10981101The following extra parameters are supported:
10991102
11001103``` python
11011104-- 8 < -- " vllm/entrypoints/pooling/base/protocol.py:pooling-common-extra-params"
11021105-- 8 < -- " vllm/entrypoints/pooling/base/protocol.py:classify-extra-params"
1103- -- 8 < -- " vllm/entrypoints/pooling/score/protocol.py:rerank-extra-params"
11041106```
11051107
11061108## Ray Serve LLM
0 commit comments