Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): update prompt of context precision #393

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/concepts/metrics/context_precision.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Context Precision

Context Precision is a metric that evaluates whether all of the ground-truth relevant items present in the `contexts` are ranked higher or not. Ideally all the relevant chunks must appear at the top ranks. This metric is computed using the `question` and the `contexts`, with values ranging between 0 and 1, where higher scores indicate better precision.
Context Precision is a metric that evaluates whether all of the ground-truth relevant items present in the `contexts` are ranked higher or not. Ideally all the relevant chunks must appear at the top ranks. This metric is computed using the `question`, `ground_truths` and the `contexts`, with values ranging between 0 and 1, where higher scores indicate better precision.

In scenarios where ground truths are unavailable for retrieval tasks, the metric relies on `question` and `context` alone for computation. However, it's important to note that the resulting precision may not be accurate comparing with the same task with ground truths.

```{math}
\text{Context Precision@k} = {\sum {\text{precision@k}} \over \text{total number of relevant items in the top K results}}
Expand Down
80 changes: 64 additions & 16 deletions src/ragas/metrics/_context_precision.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,23 +16,63 @@

CONTEXT_PRECISION = HumanMessagePromptTemplate.from_template(
"""\
Verify if the information in the given context is useful in answering the question. Use only "Yes" (1) or "No" (0) as a binary verdict.
Please verify if the information in the given context is useful in answering the question. Here are guidelines to help you make the decision:

question: What are the health benefits of green tea?
context:
This article explores the rich history of tea cultivation in China, tracing its roots back to the ancient dynasties. It discusses how different regions have developed their unique tea varieties and brewing techniques. The article also delves into the cultural significance of tea in Chinese society and how it has become a symbol of hospitality and relaxation.
verification:
{{"reason":"The context, while informative about the history and cultural significance of tea in China, does not provide specific information about the health benefits of green tea. Thus, it is not useful for answering the question about health benefits.", "verdict":"0"}}
1. If the question has no answer, use your own judgement.
2. If the question has an answer, use both question and answer to make decision, give "Yes" verdict when all of the following conditions are met:
a. Partial or complete answer statements can be obtained from the context.
b. When you find useful information in the context, make sure the entity it is talking about is in the answer.

question: How does photosynthesis work in plants?
context:
Photosynthesis in plants is a complex process involving multiple steps. This paper details how chlorophyll within the chloroplasts absorbs sunlight, which then drives the chemical reaction converting carbon dioxide and water into glucose and oxygen. It explains the role of light and dark reactions and how ATP and NADPH are produced during these processes.
verification:
{{"reason":"This context is extremely relevant and useful for answering the question. It directly addresses the mechanisms of photosynthesis, explaining the key components and processes involved.", "verdict":"1"}}
Use only "Yes" (1) or "No" (0) as a binary verdict. Output JSON with reason.

question:{question}
context:
{context}
<question> What are the health benefits of green tea? </question>
<answer> None </answer>
<context> The article explores the history of tea in China, tracing its roots to ancient dynasties. It discusses regional tea varieties, brewing techniques, and the cultural significance of tea as a symbol of hospitality and relaxation. </context>
verification:
{{
"reason":"The context, while informative about the history and cultural significance of tea in China, does not provide specific information about the health benefits of green tea. Thus, it is not useful for answering the question about health benefits.",
"verdict":"0"
}}

<question> How does photosynthesis work in plants? </question>
<answer> None </answer>
<context> Photosynthesis in plants is a complex process where chlorophyll absorbs sunlight in chloroplasts, driving a chemical reaction converting carbon dioxide and water into glucose and oxygen. The process involves light and dark reactions, producing ATP and NADPH. </context>
verification:
{{
"reason":"This context is extremely relevant and useful for answering the question. It directly addresses the mechanisms of photosynthesis, explaining the key components and processes involved.",
"verdict":"1"
}}

<question> What factors should cancer patients consider in their dietary choices? </question>
<answer> Cancer patients need to avoid calcium supplementation. </answer>
<context> For cancer patients, the intake of refined sugar should be limited, because the starch in food can prevent colon and rectal cancer. High fiber diet may also prevent colon, rectal, breast cancer and pancreatic cancer cancer. </context>
verification:
{{
reason: "The answer only mentions calcium supplementation, which is not addressed in the context. Therefore, the context is not useful for answering the question.",
"verdict": "0"
}}

<question> Who was Albert Einstein? </question>
<answer> He was a German-born theoretical physicist. </answer>
<context> Albert Einstein was a German-born theoretical physicist who developed the theory of relativity, one of the two pillars of modern physics. </context>
verification:
{{
"reason": "In the context, Albert Einstein is described as a theoretical physicist who developed the theory of relativity. This is consistent with the answer, which states that he was a theoretical physicist. Therefore, the context is useful for answering the question.",
"verdict": "1"
}}

<question> When did Qi Tian go to the United States? </question>
<question Qi Tian went to the United States in 1991. </question>
<context> in 1991, Qing Tian went to the United States. </context>
verification:
{{
"reason": "Altough the context mentioned the year 1991, it did not mention the person Qi Tian. Therefore, the context is not useful for answering the question.",
"verdict": "0"
}}

<question> {question} </question>
<answer> {answer} </answer>
<context> {context} </context>
verification:""" # noqa: E501
)

Expand Down Expand Up @@ -62,15 +102,19 @@ def _score_batch(
) -> list:
prompts = []
questions, contexts = dataset["question"], dataset["contexts"]
if "ground_truths" in dataset.column_names:
ground_truths = dataset["ground_truths"]
else:
ground_truths = [None] * len(questions)

cb = CallbackManager.configure(inheritable_callbacks=callbacks)
with trace_as_chain_group(
callback_group_name, callback_manager=cb
) as batch_group:
for qstn, ctx in zip(questions, contexts):
for qstn, ctx, gt in zip(questions, contexts, ground_truths):
human_prompts = [
ChatPromptTemplate.from_messages(
[CONTEXT_PRECISION.format(question=qstn, context=c)]
[CONTEXT_PRECISION.format(question=qstn, context=c, answer=gt)]
)
for c in ctx
]
Expand All @@ -93,6 +137,10 @@ def _score_batch(
]
scores = []

for q, resp in zip(questions, grouped_responses):
print(q)
print(resp)

for response in grouped_responses:
response = [
json_loader.safe_load(item, self.llm) for item in sum(response, [])
Expand Down