Instability in answer relevance score #1889

laughinghugs · 2025-01-30T04:01:41Z

For evaluating my RAG pipeline, in real time, I am implementing 'Answer Relevance' score. But what I found that for same set of question, answer, and context it is generating different scores. And these scores are varying by 10-15%. I understand as it is a probabilistic measure (using LLM, re-engineering questions from answer etc.), the score may not be the same always. But varying 10-15% is a trust issue. Any solution to this?

jjmachan · 2025-01-30T05:56:32Z

hey @laughinghugs this is a problem we are aware of and hoping to solve it with approaches like the once mentioned in https://blog.ragas.io/aligning-llm-as-judge-with-human-evaluators.

@shahules786 would you have any suggestions

laughinghugs added the question Further information is requested label Jan 30, 2025

dosubot bot added the bug Something isn't working label Jan 30, 2025

sahusiddharth assigned jjmachan Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instability in answer relevance score #1889

Instability in answer relevance score #1889

laughinghugs commented Jan 30, 2025

jjmachan commented Jan 30, 2025

Instability in answer relevance score #1889

Instability in answer relevance score #1889

Comments

laughinghugs commented Jan 30, 2025

jjmachan commented Jan 30, 2025