PLLM provides a 100% OpenAI-compatible API, allowing you to use existing OpenAI SDKs and tools without modification.
- Main API:
http://localhost:8080/v1/ - Alt API:
http://localhost:8080/api/v1/(API key auth)
Authorization: Bearer your-api-keyX-API-Key: your-api-keyEndpoint: POST /v1/chat/completions
Create a completion for chat messages with streaming support.
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "my-gpt-4",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 150,
"stream": false
}'| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model name (e.g., "my-gpt-4") |
messages |
array | Yes | Array of message objects |
temperature |
number | No | Sampling temperature (0-2) |
max_tokens |
integer | No | Maximum tokens to generate |
stream |
boolean | No | Enable streaming responses |
stop |
string/array | No | Stop sequences |
presence_penalty |
number | No | Presence penalty (-2 to 2) |
frequency_penalty |
number | No | Frequency penalty (-2 to 2) |
top_p |
number | No | Nucleus sampling parameter |
n |
integer | No | Number of completions to generate |
user |
string | No | User identifier |
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1677652288,
"model": "my-gpt-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 13,
"completion_tokens": 7,
"total_tokens": 20
}
}Set "stream": true to enable Server-Sent Events (SSE) streaming:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "my-gpt-4",
"messages": [{"role": "user", "content": "Count to 10"}],
"stream": true
}'Streaming response format:
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1677652288,"model":"my-gpt-4","choices":[{"index":0,"delta":{"content":"1"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1677652288,"model":"my-gpt-4","choices":[{"index":0,"delta":{"content":" 2"},"finish_reason":null}]}
data: [DONE]
Endpoint: POST /v1/completions
Legacy text completion endpoint:
curl http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "my-gpt-35-turbo",
"prompt": "Say hello",
"max_tokens": 50
}'Endpoint: POST /v1/embeddings
Create embeddings for text:
curl http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "text-embedding-ada-002",
"input": "Hello world"
}'Endpoint: GET /v1/models
List available models:
curl http://localhost:8080/v1/models \
-H "Authorization: Bearer your-api-key"Response:
{
"object": "list",
"data": [
{
"id": "my-gpt-4",
"object": "model",
"created": 1677610602,
"owned_by": "openai"
},
{
"id": "my-gpt-35-turbo",
"object": "model",
"created": 1677610602,
"owned_by": "openai"
}
]
}Endpoint: GET /v1/models/{model}
Get specific model details:
curl http://localhost:8080/v1/models/my-gpt-4 \
-H "Authorization: Bearer your-api-key"Endpoint: POST /v1/images/generations
Generate images (if supported by configured models):
curl http://localhost:8080/v1/images/generations \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"prompt": "A cute cat",
"n": 1,
"size": "1024x1024"
}'Endpoint: POST /v1/audio/transcriptions
Transcribe audio to text:
curl http://localhost:8080/v1/audio/transcriptions \
-H "Authorization: Bearer your-api-key" \
-F file="@audio.mp3" \
-F model="whisper-1"Endpoint: POST /v1/audio/speech
Generate speech from text:
curl http://localhost:8080/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "tts-1",
"input": "Hello world",
"voice": "alloy"
}' \
--output speech.mp3Endpoint: GET /health
Basic health check (no authentication required):
curl http://localhost:8080/healthResponse:
{
"status": "healthy",
"timestamp": "2024-01-15T10:30:00Z"
}Endpoint: GET /ready
Readiness check including dependencies:
curl http://localhost:8080/readyEndpoint: GET /v1/admin/models/stats
Get model performance statistics (requires authentication):
curl http://localhost:8080/v1/admin/models/stats \
-H "Authorization: Bearer your-api-key"All errors follow OpenAI format:
{
"error": {
"message": "The request is invalid",
"type": "invalid_request_error",
"code": "bad_request"
}
}invalid_request_error- Invalid request parametersauthentication_error- Invalid or missing API keypermission_error- Insufficient permissionsrate_limit_error- Rate limit exceededserver_error- Internal server errorservice_unavailable_error- Service temporarily unavailable
200- Success400- Bad Request401- Unauthorized403- Forbidden404- Not Found429- Too Many Requests500- Internal Server Error503- Service Unavailable
PLLM is compatible with official OpenAI SDKs:
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="http://localhost:8080/v1"
)
response = client.chat.completions.create(
model="my-gpt-4",
messages=[{"role": "user", "content": "Hello!"}]
)import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: 'your-api-key',
baseURL: 'http://localhost:8080/v1'
});
const response = await openai.chat.completions.create({
model: 'my-gpt-4',
messages: [{ role: 'user', content: 'Hello!' }]
});All examples use cURL for simplicity but work with any HTTP client or OpenAI SDK by changing the base URL.