Skip to content

CrateDBVectorSearch does not throw error when dimensions of docs to add and vector table do not align #24

@andnig

Description

@andnig

System Info

latest cratedb branch, langchain 0.0.339rc1

Who can help?

@Amot

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

  1. create an embedding table with 1024 dimensions
CREATE TABLE IF NOT EXISTS "repro"."embedding" (
   "collection_id" TEXT,
   "embedding" FLOAT_VECTOR(1024),
   "document" TEXT,
   "cmetadata" OBJECT(DYNAMIC),
   "custom_id" TEXT,
   "uuid" TEXT NOT NULL,
   PRIMARY KEY ("uuid")
)
  1. use the CrateDBVectorSearch interface to add documents with 1536 dimensions to this embedding table
from langchain.schema import Document
from langchain.vectorstores.cratedb import CrateDBVectorSearch
from langchain.embeddings.openai import OpenAIEmbeddings

doc = Document(page_content="this is such a nice text")
doc1 = Document(page_content="this is such a nice text")
vector_store = CrateDBVectorSearch.from_documents(
    [doc, doc1],
    OpenAIEmbeddings(api_key="<your-api-key>"),
    collection_name="wow_such_nice",
    connection_string="crate://localhost:4200?schema=repro",
)

No exception is thrown, even though the OpenAI embeddings have 1536 dimensions and therefore can't be inserted. It looks as if everyting worked as expected.

IMPORTANT: You need to have at least 2 documents to add (see the list of doc and doc1 above). With only one document, the exception is thrown as expected.

(Note: I'd not expect anyone to insert different dimension sizes on purpose. However this could happen on accident, so it might be good to notify the user, instead of swallowing the exception)

Expected behavior

An error should be provided, if the embeddings can't be inserted. Additionally the interface should behave the same for 1 or many documents.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions