Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): update prompt of context precision #393

Conversation

yuukidach
Copy link
Contributor

@yuukidach yuukidach commented Dec 19, 2023

Update prompt of context precision as disscussed in #365

Here are the results of a test set consisting of 97 medical related Q&A questions (Use GPT4 as evaluator)
context_precision_test.xlsx

  • human_eval: evaluated by human
  • ragas_before: evaluated by ragas with only question and context
  • ragas_update: evaluated by rags with question, gt and context

Subtract the rating scores from the human_eval using ragas_before and ragas_update respectively and get absolute value

mean std number of zeros
update - human 0.0676 0.1687 74
before - human 0.1796 0.2884 7

others

  1. The use of<></> is because if the context/answer has multiple fragments, it will cause gpt4 to be unable to effectively determine the text range, reducing accuracy
  2. Although the prompt looks much longer now, it actually only uses 750 tokens (measured by tiktoken), which is acceptable for most LLMs. However, the running time is indeed longer than before
  3. The prompt only tested in the medical related question with gpt4. It might overfit the situation. Maybe we can add a reflection agent like the envolver or filter in the test_generator to generalize different scenarios

@yuukidach
Copy link
Contributor Author

@shahules786 I found you have already create a PR #391 . Sorry for not handling this matter for too long. Please have a look to see if 2 PRs can merge into one

@yuukidach yuukidach changed the title feat(metric): update prompt of context precision feat(metrics): update prompt of context precision Dec 19, 2023
@shahules786
Copy link
Member

Thank you @yuukidach. No worries, I'll take a look and see how can we merge both.

@shahules786 shahules786 self-requested a review December 19, 2023 06:59
@shahules786
Copy link
Member

First of all, thanks a lot for the great work you have put in. I should have left a note before taking up the issue myself. Apologies.

Overall both of us have solved the problem in a very similar fashion. Some of the differences I can see are that

  1. In Context precision with ground truth #391 I accept either answer/ground_truths. My thought was that for any sample with question and context, it is very easy to generate an answer and evaluate it. Most users will at least have these 3 attributed. Although usage of the answer will be depreciated in context_precision and can be used with the context_utilization metric
  2. Changes in tests and Evaluation mode to handle exception situations

I couldn't think of a way to merge both. Do you have any ideas? Would you be open to pushing your documentation to #391 ?

@shahules786
Copy link
Member

Also, can we modify context_relevancy in similar way to improve performance? Ie it should use (question, context, answer) triplets to find out how much information (number of sentences/tokens/etc) from given context is used arrive at the answer. This can help users to optimise their chunk size. @yuukidach

@shahules786
Copy link
Member

Sure @yuukidach . Regarding prompts, we are refactoring and introducing a new idea of prompt object #388 . Prompts being non-trivial parts of software deserve to be more than just strings. My view of prompts is to write simple and generalizable instructions that are then followed up with good demonstrations rather than overfitting instructions to a particular model.
Up next, we will also introduce automated prompt language adaptation to allow users to adapt ragas prompts to any target language. I'll read the paper you shared, thank you.

@yuukidach yuukidach closed this Dec 19, 2023
@yuukidach yuukidach deleted the feat/metrics/context_precision branch December 26, 2023 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants