You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For evaluating my RAG pipeline, in real time, I am implementing 'Answer Relevance' score. But what I found that for same set of question, answer, and context it is generating different scores. And these scores are varying by 10-15%. I understand as it is a probabilistic measure (using LLM, re-engineering questions from answer etc.), the score may not be the same always. But varying 10-15% is a trust issue. Any solution to this?
The text was updated successfully, but these errors were encountered:
For evaluating my RAG pipeline, in real time, I am implementing 'Answer Relevance' score. But what I found that for same set of question, answer, and context it is generating different scores. And these scores are varying by 10-15%. I understand as it is a probabilistic measure (using LLM, re-engineering questions from answer etc.), the score may not be the same always. But varying 10-15% is a trust issue. Any solution to this?
The text was updated successfully, but these errors were encountered: