Elephas uses Ollama's OpenAI-compatible API (/v1/embeddings), which does not accept runtime options parameters. This causes all requests to use the global OLLAMA_CONTEXT_LENGTH setting (131072), even for embedding models trained with 8192 tokens.
The proxy translates between API formats:
-
Receive OpenAI Request
POST /v1/embeddings {"model": "snowflake-arctic-embed2", "input": ["text"]}
-
Fetch Model Metadata
- Query
/api/showfor model'sn_ctx_train - Cache result for performance
- Query
-
Translate to Ollama Native API
POST /api/embed { "model": "snowflake-arctic-embed2", "input": ["text"], "options": {"num_ctx": 8192}, "truncate": true }
-
Ollama Processes with Correct Context
- Uses
num_ctx: 8192from request - Ignores global
OLLAMA_CONTEXT_LENGTH
- Uses
-
Translate Response Back
Ollama: {"embeddings": [[...]]} β OpenAI: {"object": "list", "data": [{"embedding": [...]}]}
-
src/translator.rs- API format conversion- Request translation: OpenAI β Ollama
- Response translation: Ollama β OpenAI
- Endpoint mapping
-
src/proxy.rs- Request routing- Detects OpenAI endpoints
- Routes to translation handler
- Handles standard pass-through
-
src/model_metadata.rs- Model info caching- Fetches
n_ctx_trainfrom Ollama - Caches per model
- Fetches
OpenAI-compatible endpoints (/v1/*) in Ollama:
- β Ignore runtime
optionsparameters - β Only respect global env vars
Native Ollama endpoints (/api/*):
- β
Accept per-request
options - β Override global settings
By translating between formats, we get the best of both:
- Elephas continues using OpenAI API (no config change)
- Proxy controls
num_ctxper request (via native API) - Each model gets appropriate context length
- No client changes - Elephas works as-is
- No global setting changes - Keep 131072 for chat models
- Per-model control - Each model uses its training context
- Extensible - Framework supports future translations
Run proxy and check logs:
π¨ Incoming request: POST /v1/embeddings
π Detected model: snowflake-arctic-embed2:latest
π Model metadata - n_ctx_train: 8192
π Translating OpenAI request to Ollama native API
βοΈ Added options.num_ctx: 8192
π€ Translated request: {...}
β
Translated response back to OpenAI format
Then verify with ollama ps - context should show 8192, not 131072.