-
Notifications
You must be signed in to change notification settings - Fork 16
Switch for dissalowed special tokens #114
Copy link
Copy link
Open
Description
I'm testing Agent-Brain with Ollama right now, embedding model is
The indexing job fails if somewhere in the document special tokens are found:
Encountered text corresponding to disallowed special token '<|endoftext|>'
F.e. here https://docs.vllm.ai/en/latest/examples/offline_inference/vision_language/
I couldn't find anything in the configuration to fix this.
Btw. Ollama with "mxbai-embed-large" always fails with "context length too small", or something like this.
agent-brain, version 6.0.3
# Embedding configuration
embedding:
provider: "ollama" # openai, ollama, cohere, gemini
model: "nomic-embed-text"
api_key: "ollama-local" # Direct API key
# api_key_env: "OPENAI_API_KEY" # OR read from env var
base_url: "http://localhost:11434/v1" # Custom endpoint (for Ollama: http://localhost:11434/v1)
ENV variables:
# agent-brain
DOC_SERVE_STATE_DIR=/home/pi/.agent-brain-state
DOC_SERVE_MODE=shared
CHROMA_PERSIST_DIR=/home/pi/.agent-brain-chroma
BM25_INDEX_PATH=/home/pi/.agent-brain-bm25
ENABLE_GRAPH_INDEX=true
GRAPH_STORE_TYPE=kuzu
GRAPH_INDEX_PATH=/home/pi/.agent-brain-graph
GRAPH_USE_CODE_METADATA=true
GRAPH_USE_LLM_EXTRACTION=false
OLLAMA_API_KEY=ollama-local
OLLAMA_FLASH_ATTENTION=1
OLLAMA_KV_CACHE_TYPE=q8_0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels