| title | MemReader |
|---|---|
| desc | MemReader is MemOS's “memory translator”. It translates messy user inputs (chat, documents, images) into structured memory fragments the system can understand. |
When building AI applications, we often run into this problem: users send all kinds of things—casual chat messages, PDF documents, and images. MemReader turns these raw inputs (Raw Data) into standard memory blocks (Memory Items) with embeddings and metadata.
In short, it does three things:
- Normalization: Whether you send a string or JSON, it first converts everything into a standard format.
- Chunking: It splits long conversations or documents into appropriately sized chunks for downstream processing.
- Extraction: It calls an LLM to extract unstructured information into structured knowledge points (Fine mode), or directly generates snapshots (Fast mode).
MemReader provides two modes, corresponding to the needs for “speed” and “accuracy”:
- Characteristics: Does not call an LLM, only performs chunking and embeddings.
- Use cases:
- Users are sending messages quickly and the system needs millisecond-level responses.
- You only need to keep “snapshots” of the conversation, without deep understanding.
- Output: raw text chunks + vector index.
- Characteristics: Calls an LLM for deeper analysis.
- Use cases:
- Long-term memory writing (needs key facts extracted).
- Document analysis (needs core ideas summarized).
- Multimodal understanding (needs to understand what’s in an image).
- Output: structured facts + summary (Background) + provenance tracking (Provenance).
MemReader’s code structure is straightforward and mainly includes:
base.py: defines the interface contract that all Readers must follow.simple_struct.py: the most commonly used implementation. Focuses on pure-text conversations and local documents; lightweight and efficient.multi_modal_struct.py: an all-rounder. Handles images, file URLs, tool calls, and other complex inputs.read_multi_modal/: contains parsers for multimodal chat messages, e.g.ImageParser,FileContentParser,ToolParser, and role-based parsers.
| Your need | Recommended choice | Why |
|---|---|---|
| Only process plain text chats | SimpleStructMemReader (backend="simple_struct") |
Simple, direct, and performant. |
| Need to handle images and file links | MultiModalStructMemReader (backend="multimodal_struct") |
Built-in multimodal parsing. |
| Upgrade from Fast to Fine | Any Reader’s fine_transfer method |
Supports a progressive “store first, refine later” strategy. |
Don’t instantiate readers directly; using the factory pattern is best practice:
from memos.configs.mem_reader import MemReaderConfigFactory
from memos.mem_reader.factory import MemReaderFactory
# Create a Reader from configuration
cfg = MemReaderConfigFactory.model_validate({...})
reader = MemReaderFactory.from_config(cfg)This is the method you will call most often.
memories = reader.get_memory(
scene_data, # your input data
type="chat", # type: chat or doc
info=user_info, # user info (user_id, session_id)
mode="fine" # mode: fast or fine (highly recommended to specify explicitly!)
)- Return value:
list[list[TextualMemoryItem]].- Why a nested list? Because a long conversation may be split into multiple windows. The outer list represents windows, and the inner list represents memory items extracted from that window.
This is the most basic usage, with SimpleStructMemReader (backend="simple_struct").
# 1. Prepare input: standard OpenAI-style conversation format
conversation = [
[
{"role": "user", "content": "I have a meeting tomorrow at 3pm"},
{"role": "assistant", "content": "What is the meeting about?"},
{"role": "user", "content": "Discussing the Q4 project deadline"},
]
]
# 2. Extract memory (Fine mode)
memories = reader.get_memory(
conversation,
type="chat",
mode="fine",
info={"user_id": "u1", "session_id": "s1"}
)
# 3. Result
# memories will include extracted `TextualMemoryItem`s (nested by window)When users send images or file links, switch to MultiModalStructMemReader (backend="multimodal_struct").
# 1. Prepare input: a complex message containing files and images
scene_data = [
[
{
"role": "user",
"content": [
{"type": "text", "text": "Check this file and image"},
# Files support automatic download and parsing via URL
{"type": "file", "file": {"file_data": "https://example.com/readme.md"}},
# Images support URL
{"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}},
]
}
]
]
# 2. Extract memory
memories = multimodal_reader.get_memory(
scene_data,
type="chat",
mode="fine", # Only Fine mode invokes the vision model to parse images
info={"user_id": "u1", "session_id": "s1"}
)For better UX, you can first store the conversation quickly in Fast mode, then “refine” it into Fine memories when the system is idle.
# 1. Store quickly first (millisecond-level)
fast_memories = reader.get_memory(conversation, mode="fast", ...)
# ... store into the database ...
# 2. Refine asynchronously in the background
refined_memories = reader.fine_transfer_simple_mem(
fast_memories, # Note: fine_transfer_simple_mem expects nested windows (list[list[TextualMemoryItem]])
type="chat"
)
# 3. Replace the original fast_memories with refined_memoriesIn .env or configuration files, you can adjust these key parameters:
chat_chunker: chat chunking strategy configuration. (Chunker behavior is configured viachunker/chat_chunkerinMemReaderConfigFactory.) It determines how much context is packed together for processing. Too small may lose context; too large may exceed the LLM token limit.remove_prompt_example: whether to remove examples from the prompt. Set to True if you want to save tokens; set to False if extraction quality is not good (keep few-shot examples).direct_markdown_hostnames(multimodal only): hostname allowlist. If a file URL’s hostname is in this list (e.g.,raw.githubusercontent.com), the Reader treats it as Markdown text directly instead of trying OCR or conversion, which is more efficient.
Additional config fields (from BaseMemReaderConfig):
general_llm: optional general-purpose LLM for non chat/doc tasks (falls back tollm).image_parser_llm: optional vision LLM for image parsing (falls back togeneral_llm).