Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epic: Index #2

Open
farhoud opened this issue Jul 16, 2024 · 2 comments
Open

epic: Index #2

farhoud opened this issue Jul 16, 2024 · 2 comments
Assignees
Labels

Comments

@farhoud
Copy link
Contributor

farhoud commented Jul 16, 2024

A module that takes a graph and index it in a database.

@farhoud farhoud changed the title epic: Indexer epic: Index Jul 16, 2024
@amrhssn amrhssn added the Epic label Jul 22, 2024
@amrhssn amrhssn assigned amrhssn and mehdibalouchi and unassigned amrhssn Jul 22, 2024
@mehdibalouchi
Copy link

mehdibalouchi commented Jul 30, 2024

@amrhssn had a meeting with Farhoud and Ramin about interfaces. the conclusion was to have the interfaces based on the usage on the Digest and Retrieve side.
Two interfaces, one for Retrieve, one for Digest

  • Interface with Digest will be a pub/sub model for producing and consuming nodes of the lattice
    • Traverse starts with a file name and a function name
    • On each step, Digest will produce a Node, an Edge, and a path to the root.
    • Index on the other side, consumes each node, extending necessary indices.
  • Interface with Retrieve will be a flat representation of a subset of all nodes
    • On each query, the Index returns a set of nodes with their embeddings.
    • The Retrieve module can request for expansion on nodes. The index module will respond with a set of neighbors and edges

feel free to make fun of it

@amrhssn
Copy link

amrhssn commented Jul 30, 2024

@mehdibalouchi
Great stuff! Thanks for the update.
We should have a meeting about enrichment, and also about node and edge attributes.

I did some initial experiments, and as expected, the naive way of returning the top similarity score between the query and data embeddings doesn't work well.
We should enrich the data in a smart way in the Index module. Also, we should do some post-processing and re-ranking after computing the top-k similar nodes/edges.

The first interface for the Retrieve module is good but I also need all other attributes for pre/post-processing the initial results.

About the second interface, I'd say we wait and put off the extra engineering after we're happy with one end-to-end cycle of the app. We need to spend time on the enrichment and the indexing process itself.

Let me know when you're free to talk 🙌🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants