VAL: evaluating the chances of a tool being picked based on description and query cosine similarity #11

heaven00 · 2025-11-02T15:57:19Z

This is an experimental testing strategy to use descriptions of the tools and curated queries for each tool, what this will allow us to measure is:

An indicator of whether there the tool is likely to be picked up for a certain kind of query
detect possible overlaps in descriptions that can reduce the effectiveness of the right tool being used.

There are possibly better ways like using https://sbert.net/examples/sentence_transformer/applications/semantic-search/README.html and improving our tool descriptions to contain examples etc. but this is meant to showcase the strategy.

It's also functional in its current form too for us to improve upon and have a discussion on :)

…utation

…using cosine similarity

zilto · 2025-11-12T18:41:04Z

I really like the direction of this! I think it's a great occasion for dog-fooding dlt-hub/dlt. I'll open a separate repository to build the eval pipelines :)

heaven00 added 2 commits November 2, 2025 10:50

DEP: add sentence transformer for embedding model and similarity comp…

ba860df

…utation

VAL: evaluation of closeness of tool description to possible queries …

3626b2e

…using cosine similarity

heaven00 requested a review from zilto November 2, 2025 15:57

heaven00 self-assigned this Nov 2, 2025

heaven00 marked this pull request as draft November 2, 2025 15:57

heaven00 assigned zilto Nov 2, 2025

zilto closed this Nov 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VAL: evaluating the chances of a tool being picked based on description and query cosine similarity #11

VAL: evaluating the chances of a tool being picked based on description and query cosine similarity #11

Uh oh!

heaven00 commented Nov 2, 2025

Uh oh!

zilto commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

VAL: evaluating the chances of a tool being picked based on description and query cosine similarity #11

VAL: evaluating the chances of a tool being picked based on description and query cosine similarity #11

Uh oh!

Conversation

heaven00 commented Nov 2, 2025

Uh oh!

zilto commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants