feat(metrics): update prompt of context precision #393

yuukidach · 2023-12-19T06:13:08Z

Update prompt of context precision as disscussed in #365

Here are the results of a test set consisting of 97 medical related Q&A questions (Use GPT4 as evaluator)
context_precision_test.xlsx

human_eval: evaluated by human
ragas_before: evaluated by ragas with only question and context
ragas_update: evaluated by rags with question, gt and context

Subtract the rating scores from the human_eval using ragas_before and ragas_update respectively and get absolute value

	mean	std	number of zeros
update - human	0.0676	0.1687	74
before - human	0.1796	0.2884	7

others

The use of<></> is because if the context/answer has multiple fragments, it will cause gpt4 to be unable to effectively determine the text range, reducing accuracy
Although the prompt looks much longer now, it actually only uses 750 tokens (measured by tiktoken), which is acceptable for most LLMs. However, the running time is indeed longer than before
The prompt only tested in the medical related question with gpt4. It might overfit the situation. Maybe we can add a reflection agent like the envolver or filter in the test_generator to generalize different scenarios

yuukidach · 2023-12-19T06:15:49Z

@shahules786 I found you have already create a PR #391 . Sorry for not handling this matter for too long. Please have a look to see if 2 PRs can merge into one

shahules786 · 2023-12-19T06:59:28Z

Thank you @yuukidach. No worries, I'll take a look and see how can we merge both.

shahules786 · 2023-12-19T09:52:30Z

First of all, thanks a lot for the great work you have put in. I should have left a note before taking up the issue myself. Apologies.

Overall both of us have solved the problem in a very similar fashion. Some of the differences I can see are that

In Context precision with ground truth #391 I accept either answer/ground_truths. My thought was that for any sample with question and context, it is very easy to generate an answer and evaluate it. Most users will at least have these 3 attributed. Although usage of the answer will be depreciated in context_precision and can be used with the context_utilization metric
Changes in tests and Evaluation mode to handle exception situations

I couldn't think of a way to merge both. Do you have any ideas? Would you be open to pushing your documentation to #391 ?

shahules786 · 2023-12-19T10:01:44Z

Also, can we modify context_relevancy in similar way to improve performance? Ie it should use (question, context, answer) triplets to find out how much information (number of sentences/tokens/etc) from given context is used arrive at the answer. This can help users to optimise their chunk size. @yuukidach

shahules786 · 2023-12-19T11:29:41Z

Sure @yuukidach . Regarding prompts, we are refactoring and introducing a new idea of prompt object #388 . Prompts being non-trivial parts of software deserve to be more than just strings. My view of prompts is to write simple and generalizable instructions that are then followed up with good demonstrations rather than overfitting instructions to a particular model.
Up next, we will also introduce automated prompt language adaptation to allow users to adapt ragas prompts to any target language. I'll read the paper you shared, thank you.

feat(metric): update prompt of context precision

45ae7b8

yuukidach changed the title ~~feat(metric): update prompt of context precision~~ feat(metrics): update prompt of context precision Dec 19, 2023

shahules786 self-requested a review December 19, 2023 06:59

yuukidach closed this Dec 19, 2023

yuukidach deleted the feat/metrics/context_precision branch December 26, 2023 04:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(metrics): update prompt of context precision #393

feat(metrics): update prompt of context precision #393

Uh oh!

yuukidach commented Dec 19, 2023 •

edited

Loading

Uh oh!

yuukidach commented Dec 19, 2023

Uh oh!

shahules786 commented Dec 19, 2023

Uh oh!

shahules786 commented Dec 19, 2023

Uh oh!

shahules786 commented Dec 19, 2023

Uh oh!

shahules786 commented Dec 19, 2023

Uh oh!

Uh oh!

feat(metrics): update prompt of context precision #393

feat(metrics): update prompt of context precision #393

Uh oh!

Conversation

yuukidach commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

others

Uh oh!

yuukidach commented Dec 19, 2023

Uh oh!

shahules786 commented Dec 19, 2023

Uh oh!

shahules786 commented Dec 19, 2023

Uh oh!

shahules786 commented Dec 19, 2023

Uh oh!

shahules786 commented Dec 19, 2023

Uh oh!

Uh oh!

yuukidach commented Dec 19, 2023 •

edited

Loading