Skip to content

Hoping for confirmation of a few high-level ideas? #921

Open
@plbremer

Description

@plbremer

Hi,

Thanks for putting together this very compelling tooling. I was hoping to ask a few specific questions about what is going on to make sure that everything is working as we expect before trying to productionize :)

  1. We can/should build a classic document retrieval index with Tantivy up-front in the case of >10,000 <100,000 documents. This index does not involve a vector store at all.
  2. In the publication's Figure 1a, the Tantivy document store is the tool that the Paper Search agent is interacting with.
  3. Any vectorization that occurs happens on-the-fly with the Gather Evidence Agent. Where is this vectorization stored? Is it possible to slowly accumulate vectors somewhere? I recognize that we can save a Docs object, however, every query will probably have a unique set of documents that is retrieved, so it is not clear if we can meaningfully aggregate previous vectorizations. (obviously the system works even if we cant accumulate these meaningfully)
  4. The README mentions options for larger-than-memory vector stores. Is this relevant for anything other than opting for a tremendously large k? Can we parametrically avoid this?
  5. If you have custom citations, or no citations, will the Citation Traversal agent simply not operate? Where does the citation graph come from? If I have internal documents, can I provide my own?
  6. It looks like my answers triggered the creation of an index. Is there any documentation around interacting with that SeachIndex?

Thanks for your time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions