feat: Vector search #1792

richiejp · 2024-03-04T09:55:26Z

Unless we have an Open Source LLM with a 1M+ token context length then we need vector search for the assistant API: #1273 (comment)

Even with a very large context length it is still far cheaper to use vector search with embeddings. It can all easily be done on CPU.

Implementation

I see three main options for adding vector search:

Simple in-memory brute force search. We regenerate the embeddings instead of saving them to storage.
Add one or more vector databases as a backend
Connect to an external database

The first is easy to implement and doesn't have any upkeep because we flush everything after a restart. If we want to change chunking size or any hyperparameter it has the same cost doing a restart. There is plenty of prior art in Go:

https://github.com/marekgalovic/anndb
https://github.com/aws-samples/gofast-hnsw/?tab=readme-ov-file#brute-search-performance
Milvus, Weaviate, Gorse

Even implementing HNSW or Annoy would not be difficult. The main problems I see are the classic database issues. So I am in favor of doing 1. or 3. no in-between. Although saving embeddings to a flat file could be OK, just not on the first iteration.

I did make an experiment using BadgerDB, but talked myself out of it: https://github.com/richiejp/badger-cybertron-vector/blob/main/main.go. The problem is that it complicates comparing the vectors and then we also have to maintain state between restarts.

API

Obviously we will follow the OpenAI API as in #1273, but I think it would also make sense to have some API to do simple search without an LLM. Just so people can do fuzzy search with LocalAI instead of reaching for another tool. Suggestions for how this API should look welcome.

dave-gray101 · 2024-03-04T19:05:50Z

Personally, my thought is that we should aim for something like 2... in order to get both 1 and 3. I think we should set up an interface that we require from a vector search system first - and then allow the user to select their vector search backend via configuration. I'll definitely need to do some research to see if what I'm proposing even makes sense - but I assume that no matter the vector search backend, the interface we'll need to interact with should be fairly constant.

I'm assuming that in many production cases, people will want to use an external vector search database, as they will definitely have better performance than anything we make :D

However, for the sake of our tests and quick development cycles, I like the idea of a really quick "in memory" backend - the fewer external dependencies in that case the better.

Notably, I don't think these should be exactly the same as our gRPC generation backends - this might be better accomplished with a simple go interface.

richiejp · 2024-03-04T23:10:39Z

I went ahead and added to the gRPC backend before seeing your post (I'll create a WIP PR shortly). Possibly it's too much of a break from the existing backends and is overloading the interface. However my feeling is also that a simple interface can cover most use cases as you say. I created an interface that is similar to a basic key-value store and is column orientated like most the vector databases I have seen.

It won't cover hybrid (i.e. non vector) searches and creating indexes. I can see uses for that, but for now I hope it is enough just to split the entries into groups.

service Backend {
  ...
  rpc StoresSet(StoresSetOptions) returns (Result) {}
  rpc StoresDelete(StoresDeleteOptions) returns (Result) {}
  rpc StoresGet(StoresGetOptions) returns (StoresGetResult) {}
  rpc StoresFind(StoresFindOptions) returns (StoresFindResult) {}
}

message StoresKey {
  // TODO: Add shard/database/file ID to separate unrelated embeddings
  repeated float Floats = 1;
}

message StoresValue {
  bytes Bytes = 1;
}

message StoresSetOptions {
  repeated StoresKey Keys = 1;
  repeated StoresValue Values = 2;
}

message StoresDeleteOptions {
  repeated StoresKey Keys = 1;
}

message StoresGetOptions {
  repeated StoresKey Key = 1;
}

message StoresGetResult {
  repeated StoresKey Keys = 1;
  repeated StoresValue Values = 2;
}

message StoresFindOptions {
  StoresKey Key = 1;
  int32 TopK = 2;
}

message StoresFindResult {
  repeated StoresKey Keys = 1;
  repeated StoresValue Values = 2;
  repeated float Similarities = 3;
}

richiejp · 2024-03-08T14:21:46Z

The PR now implements an internal gRPC API with vector search. The next step is to create an HTTP API which mirrors the gRPC one in my current thinking. Then some e2e testing with an external script can be done or with the HTTP Go tests.

richiejp · 2024-03-11T17:49:26Z

I added a HTTP API which mirrors the gRPC API and some very basic tests for that.

richiejp · 2024-03-13T09:26:15Z

Ah, and now I see colBERT: https://jina.ai/news/what-is-colbert-and-late-interaction-and-why-they-matter-in-search/

richiejp · 2024-03-26T13:58:50Z

Probably at the very least an ID field is needed so that the embedding vector is not being used as an ID.

…1.0@8f708d1 by renovate (#19852) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) | minor | `v2.10.1` -> `v2.11.0` | --- > [!WARNING] > Some dependencies could not be looked up. Check the Dependency Dashboard for more information. --- ### Release Notes <details> <summary>mudler/LocalAI (docker.io/localai/localai)</summary> ### [`v2.11.0`](https://togithub.com/mudler/LocalAI/releases/tag/v2.11.0) [Compare Source](https://togithub.com/mudler/LocalAI/compare/v2.10.1...v2.11.0) ### Introducing LocalAI v2.11.0: All-in-One Images! Hey everyone! 🎉 I'm super excited to share what we've been working on at LocalAI - the launch of v2.11.0. This isn't just any update; it's a massive leap forward, making LocalAI easier to use, faster, and more accessible for everyone. #### 🌠 The Spotlight: All-in-One Images, OpenAI in a box Imagine having a magic box that, once opened, gives you everything you need to get your AI project off the ground with generative AI. A full clone of OpenAI in a box. That's exactly what our AIO images are! Designed for both CPU and GPU environments, these images come pre-packed with a full suite of models and backends, ready to go right out of the box. Whether you're using Nvidia, AMD, or Intel, we've got an optimized image for you. If you are using CPU-only you can enjoy even smaller and lighter images. To start LocalAI, pre-configured with function calling, llm, tts, speech to text, and image generation, just run: ```bash docker run -p 8080:8080 --name local-ai -ti localai/localai:latest-aio-cpu #### Do you have a Nvidia GPUs? Use this instead #### CUDA 11 ### docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-cuda-11 #### CUDA 12 ### docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-cuda-12 ``` ##### ❤️ Why You're Going to Love AIO Images: - Ease of Use: Say goodbye to the setup blues. With AIO images, everything is configured upfront, so you can dive straight into the fun part - hacking! - Flexibility: CPU, Nvidia, AMD, Intel? We support them all. These images are made to adapt to your setup, not the other way around. - Speed: Spend less time configuring and more time innovating. Our AIO images are all about getting you across the starting line as fast as possible. ##### 🌈 Jumping In Is a Breeze: Getting started with AIO images is as simple as pulling from Docker Hub or Quay and running it. We take care of the rest, downloading all necessary models for you. For all the details, including how to customize your setup with environment variables, our updated docs have got you covered [here](https://localai.io/basics/getting_started/), while you can get more details of the AIO images [here](https://localai.io/docs/reference/aio-images/). #### 🎈 Vector Store Thanks to the great contribution from [@richiejp](https://togithub.com/richiejp) now LocalAI has a new backend type, "vector stores" that allows to use LocalAI as in-memory Vector DB ([https://github.com/mudler/LocalAI/issues/1792](https://togithub.com/mudler/LocalAI/issues/1792)). You can learn more about it [here](https://localai.io/stores/)! #### 🐛 Bug fixes This release contains major bugfixes to the watchdog component, and a fix to a regression introduced in v2.10.x which was not respecting `--f16`, `--threads` and `--context-size` to be applied as model's defaults. #### 🎉 New Model defaults for llama.cpp Model defaults has changed to automatically offload maximum GPU layers if a GPU is available, and it sets saner defaults to the models to enhance the LLM's output. #### 🧠 New pre-configured models You can now run `llava-1.6-vicuna`, `llava-1.6-mistral` and `hermes-2-pro-mistral`, see [Run other models](https://localai.io/docs/getting-started/run-other-models/) for a list of all the pre-configured models available in the release. ### 📣 Spread the word! First off, a massive thank you (again!) to each and every one of you who've chipped in to squash bugs and suggest cool new features for LocalAI. Your help, kind words, and brilliant ideas are truly appreciated - more than words can say! And to those of you who've been heros, giving up your own time to help out fellow users on Discord and in our repo, you're absolutely amazing. We couldn't have asked for a better community. Just so you know, LocalAI doesn't have the luxury of big corporate sponsors behind it. It's all us, folks. So, if you've found value in what we're building together and want to keep the momentum going, consider showing your support. A little shoutout on your favorite social platforms using @LocalAI_OSS and @mudler_it or joining our sponsors can make a big difference. Also, if you haven't yet joined our Discord, come on over! Here's the link: https://discord.gg/uJAeKSAGDy Every bit of support, every mention, and every star adds up and helps us keep this ship sailing. Let's keep making LocalAI awesome together! Thanks a ton, and here's to more exciting times ahead with LocalAI! ### 🔗 Links - Quickstart docs (how to run with AIO images): https://localai.io/basics/getting_started/ - More reference on AIO image: https://localai.io/docs/reference/aio-images/ - List of embedded models that can be started: https://localai.io/docs/getting-started/run-other-models/ ### 🎁 What's More in v2.11.0? ##### Bug fixes 🐛 - fix(config): pass by config options, respect defaults by [@mudler](https://togithub.com/mudler) in [https://github.com/mudler/LocalAI/pull/1878](https://togithub.com/mudler/LocalAI/pull/1878) - fix(watchdog): use ShutdownModel instead of StopModel by [@mudler](https://togithub.com/mudler) in [https://github.com/mudler/LocalAI/pull/1882](https://togithub.com/mudler/LocalAI/pull/1882) - NVIDIA GPU detection support for WSL2 environments by [@enricoros](https://togithub.com/enricoros) in [https://github.com/mudler/LocalAI/pull/1891](https://togithub.com/mudler/LocalAI/pull/1891) - Fix NVIDIA VRAM detection on WSL2 environments by [@enricoros](https://togithub.com/enricoros) in [https://github.com/mudler/LocalAI/pull/1894](https://togithub.com/mudler/LocalAI/pull/1894) ##### Exciting New Features 🎉 - feat(functions/aio): all-in-one images, function template enhancements by [@mudler](https://togithub.com/mudler) in [https://github.com/mudler/LocalAI/pull/1862](https://togithub.com/mudler/LocalAI/pull/1862) - feat(aio): entrypoint, update workflows by [@mudler](https://togithub.com/mudler) in [https://github.com/mudler/LocalAI/pull/1872](https://togithub.com/mudler/LocalAI/pull/1872) - feat(aio): add tests, update model definitions by [@mudler](https://togithub.com/mudler) in [https://github.com/mudler/LocalAI/pull/1880](https://togithub.com/mudler/LocalAI/pull/1880) - feat(stores): Vector store backend by [@richiejp](https://togithub.com/richiejp) in [https://github.com/mudler/LocalAI/pull/1795](https://togithub.com/mudler/LocalAI/pull/1795) - ci(aio): publish hipblas and Intel GPU images by [@mudler](https://togithub.com/mudler) in [https://github.com/mudler/LocalAI/pull/1883](https://togithub.com/mudler/LocalAI/pull/1883) - ci(aio): add latest tag images by [@mudler](https://togithub.com/mudler) in [https://github.com/mudler/LocalAI/pull/1884](https://togithub.com/mudler/LocalAI/pull/1884) ##### 🧠 Models - feat(models): add phi-2-chat, llava-1.6, bakllava, cerbero by [@mudler](https://togithub.com/mudler) in [https://github.com/mudler/LocalAI/pull/1879](https://togithub.com/mudler/LocalAI/pull/1879) ##### 📖 Documentation and examples - ⬆️ Update docs version mudler/LocalAI by [@localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1856](https://togithub.com/mudler/LocalAI/pull/1856) - docs(mac): improve documentation for mac build by [@tauven](https://togithub.com/tauven) in [https://github.com/mudler/LocalAI/pull/1873](https://togithub.com/mudler/LocalAI/pull/1873) - docs(aio): Add All-in-One images docs by [@mudler](https://togithub.com/mudler) in [https://github.com/mudler/LocalAI/pull/1887](https://togithub.com/mudler/LocalAI/pull/1887) - fix(aio): make image-gen for GPU functional, update docs by [@mudler](https://togithub.com/mudler) in [https://github.com/mudler/LocalAI/pull/1895](https://togithub.com/mudler/LocalAI/pull/1895) ##### 👒 Dependencies - ⬆️ Update ggerganov/whisper.cpp by [@localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1508](https://togithub.com/mudler/LocalAI/pull/1508) - ⬆️ Update ggerganov/llama.cpp by [@localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1857](https://togithub.com/mudler/LocalAI/pull/1857) - ⬆️ Update ggerganov/llama.cpp by [@localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1864](https://togithub.com/mudler/LocalAI/pull/1864) - ⬆️ Update ggerganov/llama.cpp by [@localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1866](https://togithub.com/mudler/LocalAI/pull/1866) - ⬆️ Update ggerganov/whisper.cpp by [@localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1867](https://togithub.com/mudler/LocalAI/pull/1867) - ⬆️ Update ggerganov/llama.cpp by [@localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1874](https://togithub.com/mudler/LocalAI/pull/1874) - ⬆️ Update ggerganov/whisper.cpp by [@localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1875](https://togithub.com/mudler/LocalAI/pull/1875) - ⬆️ Update ggerganov/llama.cpp by [@localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1881](https://togithub.com/mudler/LocalAI/pull/1881) - ⬆️ Update ggerganov/llama.cpp by [@localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1885](https://togithub.com/mudler/LocalAI/pull/1885) - ⬆️ Update ggerganov/llama.cpp by [@localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1889](https://togithub.com/mudler/LocalAI/pull/1889) ##### Other Changes - ⬆️ Update ggerganov/whisper.cpp by [@localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1896](https://togithub.com/mudler/LocalAI/pull/1896) - ⬆️ Update ggerganov/llama.cpp by [@localai-bot](https://togithub.com/localai-bot) in [https://github.com/mudler/LocalAI/pull/1897](https://togithub.com/mudler/LocalAI/pull/1897) #### New Contributors - [@enricoros](https://togithub.com/enricoros) made their first contribution in [https://github.com/mudler/LocalAI/pull/1891](https://togithub.com/mudler/LocalAI/pull/1891) **Full Changelog**: mudler/LocalAI@v2.10.1...v2.11.0 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).

richiejp added the enhancement New feature or request label Mar 4, 2024

richiejp mentioned this issue Mar 4, 2024

feat(stores): Vector store backend #1795

Merged

8 tasks

richiejp mentioned this issue Mar 26, 2024

feat: Retrieval #1900

Open

richiejp mentioned this issue Apr 18, 2024

Replace own vector search with Chromem-go #2065

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Vector search #1792

feat: Vector search #1792

richiejp commented Mar 4, 2024

dave-gray101 commented Mar 4, 2024 •

edited

Loading

richiejp commented Mar 4, 2024

richiejp commented Mar 8, 2024

richiejp commented Mar 11, 2024

richiejp commented Mar 13, 2024

richiejp commented Mar 26, 2024

feat: Vector search #1792

feat: Vector search #1792

Comments

richiejp commented Mar 4, 2024

Implementation

API

dave-gray101 commented Mar 4, 2024 • edited Loading

richiejp commented Mar 4, 2024

richiejp commented Mar 8, 2024

richiejp commented Mar 11, 2024

richiejp commented Mar 13, 2024

richiejp commented Mar 26, 2024

dave-gray101 commented Mar 4, 2024 •

edited

Loading