Skip to content

Switch for dissalowed special tokens #114

@gitgotcha77

Description

@gitgotcha77

I'm testing Agent-Brain with Ollama right now, embedding model is

The indexing job fails if somewhere in the document special tokens are found:

Encountered text corresponding to disallowed special token '<|endoftext|>'

F.e. here https://docs.vllm.ai/en/latest/examples/offline_inference/vision_language/

I couldn't find anything in the configuration to fix this.

Btw. Ollama with "mxbai-embed-large" always fails with "context length too small", or something like this.

agent-brain, version 6.0.3

# Embedding configuration
embedding:
  provider: "ollama"        # openai, ollama, cohere, gemini
  model: "nomic-embed-text"
  api_key: "ollama-local"   # Direct API key
  # api_key_env: "OPENAI_API_KEY"  # OR read from env var
  base_url: "http://localhost:11434/v1"  # Custom endpoint (for Ollama: http://localhost:11434/v1)

ENV variables:

# agent-brain
DOC_SERVE_STATE_DIR=/home/pi/.agent-brain-state
DOC_SERVE_MODE=shared
CHROMA_PERSIST_DIR=/home/pi/.agent-brain-chroma
BM25_INDEX_PATH=/home/pi/.agent-brain-bm25
ENABLE_GRAPH_INDEX=true
GRAPH_STORE_TYPE=kuzu
GRAPH_INDEX_PATH=/home/pi/.agent-brain-graph
GRAPH_USE_CODE_METADATA=true
GRAPH_USE_LLM_EXTRACTION=false

OLLAMA_API_KEY=ollama-local
OLLAMA_FLASH_ATTENTION=1
OLLAMA_KV_CACHE_TYPE=q8_0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions