Open
Description
Hi,
Thanks for putting together this very compelling tooling. I was hoping to ask a few specific questions about what is going on to make sure that everything is working as we expect before trying to productionize :)
- We can/should build a classic document retrieval index with Tantivy up-front in the case of >10,000 <100,000 documents. This index does not involve a vector store at all.
- In the publication's Figure 1a, the Tantivy document store is the tool that the Paper Search agent is interacting with.
- Any vectorization that occurs happens on-the-fly with the Gather Evidence Agent. Where is this vectorization stored? Is it possible to slowly accumulate vectors somewhere? I recognize that we can save a Docs object, however, every query will probably have a unique set of documents that is retrieved, so it is not clear if we can meaningfully aggregate previous vectorizations. (obviously the system works even if we cant accumulate these meaningfully)
- The README mentions options for larger-than-memory vector stores. Is this relevant for anything other than opting for a tremendously large k? Can we parametrically avoid this?
- If you have custom citations, or no citations, will the Citation Traversal agent simply not operate? Where does the citation graph come from? If I have internal documents, can I provide my own?
- It looks like my answers triggered the creation of an index. Is there any documentation around interacting with that SeachIndex?
Thanks for your time.