Skip to content

Latest commit

 

History

History
56 lines (37 loc) · 2.84 KB

system_overview.mdx

File metadata and controls

56 lines (37 loc) · 2.84 KB
title description
System Overview
Explanation of different system components and flows

This page discusses how Danswer works from a high level. The aim is to give transparency into our designs. That way you can have peace of mind in using Danswer.

Alternatively if you're looking to customize the system or become an open source contributors, this is a great place to start.

System Architecture

DanswerArchitecture

Whether deploying Danswer on a single instance or a container orchestration platform, the flow of data is the same. Documents are pulled and processed via connectors and then persisted in Vespa/Postgres which run in containers within your system.

Danswer has no call-home features so we never have access to your data. We don't receive any communications at all, including even telemetry. So if you're using Danswer, please tell us as we have no way of knowing!

The only time any data leaves your Danswer setup is when it makes a call to an LLM to generate an answer. The data persistence there depends on which LLM hosting service you are using. Some users have even opted to host Llama2 70B on-prem, in which case Danswer is fully airgapped and data is never sent outside the deployment.

Embedding Flow

DanswerEmbeddingFlow

Each document is split into smaller sections called "Chunks".

By passing chunks to the LLM instead of full documents, we are able to reduce noise to the model by only passing in the relevant sections of the document. Additionally this is significantly more cost efficient as LLM services generally charge per token. Finally by embedding chunks rather than full documents, we are able to retain more detail as each vector embedding is only capable of encoding a limited amount of information.

The addition of mini-chunks takes this concept even further. By embedding at different sizes, Danswer can retrieve both high level context and details. Mini-chunks can also be turned on/off via environment variables as generating multiple vectors per chunk can slow down document indexing if deployed on less capable hardware.

In choosing our embedding model, we use the latest state-of-the-art biencoder that is small enough to run on CPU while maintaining subsecond document retrieval times.

Query Flow

DanswerQueryFlow

This flow is often updated as we're constantly striving to push the capabilities of the retrieval pipeline with the most recent advancements coming out of research and the opensource community. Also note that many of the parameters of this flow such as how much documents to retrieve, how many to rerank, what models to use, which chunks are passed to the LLM, etc. are all configurable.

If there are any questions, don't hesitate to reach out to the maintainers!