Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions docs/examples/intrinsics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,19 @@ Detects when model outputs contain hallucinated information.
### query_rewrite.py
Rewrites queries for better retrieval or understanding.

### uncertainty.py
Estimates the model's certainty about answering a question.

### requirement_check.py
Detect if text adheres to provided requirements.

### factuality_detection.py
Detects if the the model's output is factually incorrect relative to context.

### factuality_correction.py
Corrects a factually incorrect response relative to context.


## Concepts Demonstrated

- **Intrinsic Functions**: Specialized model capabilities beyond text generation
Expand All @@ -48,7 +61,7 @@ from mellea.stdlib.components import Intrinsic
import mellea.stdlib.functional as mfuncs

# Create backend and adapter
backend = LocalHFBackend(model_id="ibm-granite/granite-3.3-8b-instruct")
backend = LocalHFBackend(model_id="ibm-granite/granite-4.0-micro")
adapter = IntrinsicAdapter("requirement_check",
base_model_name=backend.base_model_name)
backend.add_adapter(adapter)
Expand All @@ -71,9 +84,12 @@ out, new_ctx = mfuncs.act(
- **context_relevance**: Assess context-query relevance
- **hallucination_detection**: Detect hallucinated content
- **query_rewrite**: Improve query formulation
- **uncertainty**: Estimate certainty about answering a question
- **factuality_detection**: Detect factually incorrect responses
- **factuality_correction**: Correct factually incorrect responses

## Related Documentation

- See `mellea/stdlib/components/intrinsic/` for intrinsic implementations
- See `mellea/backends/adapters/` for adapter system
- See `docs/dev/intrinsics_and_adapters.md` for architecture details
- See `docs/dev/intrinsics_and_adapters.md` for architecture details
45 changes: 45 additions & 0 deletions docs/examples/intrinsics/factuality_correction.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# pytest: huggingface, requires_heavy_ram, llm

"""Example usage of the factuality correction intrinsic.

To run this script from the root of the Mellea source tree, use the command:
```
uv run python docs/examples/intrinsics/factuality_correction.py
```
"""

from mellea.backends.huggingface import LocalHFBackend
from mellea.stdlib.components import Document, Message
from mellea.stdlib.context import ChatContext
from mellea.stdlib.components.intrinsic import guardian

user_text = "Is Ozzy Osbourne still alive?"
response_text = "Yes, Ozzy Osbourne is alive in 2025 and preparing for another world tour, continuing to amaze fans with his energy and resilience."
document = Document(
# Context says Ozzy Osbourne is dead, but the response says he is alive.
"Ozzy Osbourne passed away on July 22, 2025, at the age of 76 from a heart attack. "
"He died at his home in Buckinghamshire, England, with contributing conditions "
"including coronary artery disease and Parkinson's disease. His final "
"performance took place earlier that month in Birmingham."
)

# Create the backend.
backend = LocalHFBackend(model_id="ibm-granite/granite-4.0-micro")
context = (
ChatContext()
.add(document)
.add(Message("user", user_text))
.add(Message("assistant", response_text))
)

# Create the backend.
backend = LocalHFBackend(model_id="ibm-granite/granite-4.0-micro")
context = (
ChatContext()
.add(document)
.add(Message("user", user_text))
.add(Message("assistant", response_text))
)

result = guardian.factuality_correction(context, backend)
print(f"Result of factuality correction: {result}") # corrected response string
37 changes: 37 additions & 0 deletions docs/examples/intrinsics/factuality_detection.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# pytest: huggingface, requires_heavy_ram, llm

"""Example usage of the factuality detection intrinsic.

To run this script from the root of the Mellea source tree, use the command:
```
uv run python docs/examples/intrinsics/factuality_detection.py
```
"""

from mellea.backends.huggingface import LocalHFBackend
from mellea.stdlib.components import Document, Message
from mellea.stdlib.context import ChatContext
from mellea.stdlib.components.intrinsic import guardian

user_text = "Is Ozzy Osbourne still alive?"
response_text = "Yes, Ozzy Osbourne is alive in 2025 and preparing for another world tour, continuing to amaze fans with his energy and resilience."

document = Document(
# Context says Ozzy Osbourne is dead, but the response says he is alive.
"Ozzy Osbourne passed away on July 22, 2025, at the age of 76 from a heart attack. "
"He died at his home in Buckinghamshire, England, with contributing conditions "
"including coronary artery disease and Parkinson's disease. His final "
"performance took place earlier that month in Birmingham."
)

# Create the backend.
backend = LocalHFBackend(model_id="ibm-granite/granite-4.0-micro")
context = (
ChatContext()
.add(document)
.add(Message("user", user_text))
.add(Message("assistant", response_text))
)

result = guardian.factuality_detection(context, backend)
print(f"Result of factuality detection: {result}") # string "yes" or "no"
51 changes: 51 additions & 0 deletions docs/examples/intrinsics/requirement_check.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# pytest: huggingface, requires_heavy_ram, llm

"""Example usage of the requirement check intrinsic.

Intrinsic function that determines if the text satisfies the given requirements.

To run this script from the root of the Mellea source tree, use the command:
```
uv run python docs/examples/intrinsics/requirement_check.py
```
"""

from mellea.backends.huggingface import LocalHFBackend
from mellea.stdlib.components import Message
from mellea.stdlib.components.intrinsic import core
from mellea.stdlib.context import ChatContext

user_text = "Invite for an IBM office party."
response_text = """
Dear Team,

To celebrate our recent successes and take a well-deserved moment to recharge,
you are cordially invited to a team social. Please join us for an evening of
live music, appetizers, and drinks as we recognize our collective wins.

Event Details
* **Date:** Saturday, April 25, 2026
* **Time:** 6:00 PM
* **Location:** Ryan’s Bar, Chelsea, NY
* **Highlights:** Live entertainment and refreshments

RSVP
To ensure we have an accurate headcount for catering, please confirm your
attendance by **Friday, April 10, 2026**.

We look forward to seeing everyone there and celebrating our hard work together.

**Best regards,**
[Your Name/Management Team]
"""
requirement = "Use a professional tone."

backend = LocalHFBackend(model_id="ibm-granite/granite-4.0-micro")
context = (
ChatContext()
.add(Message("user", user_text))
.add(Message("assistant", response_text))
)

result = core.requirement_check(context, backend, requirement)
print(f"Requirements Satisfied: {result}") # float between 0.0 and 1.0
27 changes: 27 additions & 0 deletions docs/examples/intrinsics/uncertainty.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# pytest: huggingface, requires_heavy_ram, llm

"""Example usage of the uncertainty/certainty intrinsic.

Evaluates how certain the model is about its response to a user question.
The context should contain a user question followed by an assistant answer.

To run this script from the root of the Mellea source tree, use the command:
```
uv run python docs/examples/intrinsics/uncertainty.py
```
"""

from mellea.backends.huggingface import LocalHFBackend
from mellea.stdlib.components import Message
from mellea.stdlib.components.intrinsic import core
from mellea.stdlib.context import ChatContext

backend = LocalHFBackend(model_id="ibm-granite/granite-4.0-micro")
context = (
ChatContext()
.add(Message("user", "What is the square root of 4?"))
.add(Message("assistant", "The square root of 4 is 2."))
)

result = core.check_certainty(context, backend)
print(f"Certainty score: {result}")
5 changes: 3 additions & 2 deletions mellea/backends/adapters/catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,14 +55,15 @@ class IntriniscsCatalogEntry(pydantic.BaseModel):

_RAG_REPO = "ibm-granite/granite-lib-rag-r1.0"
_CORE_REPO = "ibm-granite/rag-intrinsics-lib"
_CORE_R1_REPO = "ibm-granite/granitelib-core-r1.0"


_INTRINSICS_CATALOG_ENTRIES = [
############################################
# Core Intrinsics
############################################
IntriniscsCatalogEntry(name="requirement_check", repo_id=_CORE_REPO),
IntriniscsCatalogEntry(name="uncertainty", repo_id=_CORE_REPO),
IntriniscsCatalogEntry(name="requirement_check", repo_id=_CORE_R1_REPO),
IntriniscsCatalogEntry(name="uncertainty", repo_id=_CORE_R1_REPO),
############################################
# RAG Intrinsics
############################################
Expand Down
57 changes: 57 additions & 0 deletions mellea/stdlib/components/intrinsic/_util.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
"""Shared utilities for intrinsic convenience wrappers."""

import json

from ....backends import ModelOption
from ....backends.adapters import AdapterMixin, AdapterType, IntrinsicAdapter
from ....stdlib import functional as mfuncs
from ...context import ChatContext
from .intrinsic import Intrinsic


def call_intrinsic(
intrinsic_name: str,
context: ChatContext,
backend: AdapterMixin,
/,
kwargs: dict | None = None,
):
"""Shared code for invoking intrinsics.

:returns: Result of the call in JSON format.
"""
# Adapter needs to be present in the backend before it can be invoked.
# We must create the Adapter object in order to determine whether we need to create
# the Adapter object.
base_model_name = backend.base_model_name
if base_model_name is None:
raise ValueError("Backend has no model ID")
adapter = IntrinsicAdapter(
intrinsic_name, adapter_type=AdapterType.LORA, base_model_name=base_model_name
)
if adapter.qualified_name not in backend.list_adapters():
backend.add_adapter(adapter)

# Create the AST node for the action we wish to perform.
intrinsic = Intrinsic(intrinsic_name, intrinsic_kwargs=kwargs)

# Execute the AST node.
model_output_thunk, _ = mfuncs.act(
intrinsic,
context,
backend,
model_options={ModelOption.TEMPERATURE: 0.0},
# No rejection sampling, please
strategy=None,
)

# act() can return a future. Don't know how to handle one from non-async code.
assert model_output_thunk.is_computed()

# Output of an Intrinsic action is the string representation of the output of the
# intrinsic. Parse the string.
result_str = model_output_thunk.value
if result_str is None:
raise ValueError("Model output is None.")
result_json = json.loads(result_str)
return result_json
52 changes: 52 additions & 0 deletions mellea/stdlib/components/intrinsic/core.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
"""Intrinsic functions for core model capabilities."""

from ....backends.adapters import AdapterMixin
from ...components import Message
from ...context import ChatContext
from ._util import call_intrinsic


def check_certainty(context: ChatContext, backend: AdapterMixin) -> float:
"""Estimate the model's certainty about its last response.

Intrinsic function that evaluates how certain the model is about the
assistant's response to a user's question. The context should end with
a user question followed by an assistant answer.

:param context: Chat context containing user question and assistant answer.
:param backend: Backend instance that supports LoRA/aLoRA adapters.

:return: Certainty score as a float (higher = more certain).
"""
result_json = call_intrinsic("uncertainty", context, backend)
return result_json["certainty"]


_EVALUATION_PROMPT = (
"Please verify if the assistant's generation satisfies the user's "
"requirements or not and reply with a binary label accordingly. "
'Respond with a json {"score": "yes"} if the constraints are '
'satisfied or respond with {"score": "no"} if the constraints are not '
"satisfied."
)


def requirement_check(
context: ChatContext, backend: AdapterMixin, requirement: str
) -> float:
"""Detect if text adheres to provided requirements.

Intrinsic function that determines if the text satisfies the given
requirements. Appends an evaluation prompt to the context following
the format specified by the Granite Guardian requirement checker model card.

:param context: Chat context containing user question and assistant answer.
:param backend: Backend instance that supports LoRA/aLoRA adapters.
:param requirement: set of requirements to satisfy

:return: Score as a float between 0.0 and 1.0 (higher = more likely satisfied).
"""
eval_message = f"<requirements>: {requirement}\n{_EVALUATION_PROMPT}"
context = context.add(Message("user", eval_message))
result_json = call_intrinsic("requirement_check", context, backend)
return result_json["requirement_check"]["score"]
57 changes: 57 additions & 0 deletions mellea/stdlib/components/intrinsic/guardian.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
"""Intrinsic functions for the Guardian component."""

from ....backends.adapters import AdapterMixin
from ...components import Message
from ...context import ChatContext
from ._util import call_intrinsic


def factuality_detection(context: ChatContext, backend: AdapterMixin) -> float:
"""Determine is the last response is factually incorrect.
Intrinsic function that evaluates the factuality of the
assistant's response to a user's question. The context should end with
a user question followed by an assistant answer.
:param context: Chat context containing user question and assistant answer.
:param backend: Backend instance that supports LoRA/aLoRA adapters.
:return: Factuality score as a "yes/no" label (yes = factually incorrect).
"""

detector_message = """
<guardian>As a judge agent, your role is to help assess whether the provided text meets the given judging criteria, utilizing all available information, including conversations, documents, and tools.
### Criteria: A factually incorrect response occurs when the assistant's message contains one or more factual claims that are unsupported by, inconsistent with, or directly contradicted by the information provided in the documents or context. This includes situations where the assistant: introduces details not grounded in the context, misstates or distorts facts contained within the context, misinterprets the meaning or implications of the context, supplies erroneous or conflicting information relative to the context. Even if only a small portion of the response contains such inaccuracies, the overall message is considered factually incorrect.
### Scoring Schema: If the last assistant's text meets the criteria, return 'yes'; otherwise, return 'no'.
"""

context = context.add(Message("user", detector_message))
result_json = call_intrinsic("factuality_detection", context, backend)
return result_json["score"]

def factuality_correction(context: ChatContext, backend: AdapterMixin) -> float:
"""Corrects the last response so that it is factually correct relative
to the given contextual information.
Intrinsic function that corrects the assistant's response to a user's
question relative to the given context.
:param context: Chat context containing user question and assistant answer.
:param backend: Backend instance that supports LoRA/aLoRA adapters.
:return: Correct assistant response.
"""

corrector_message = """
<guardian>As a judge agent, your role is to help assess whether the provided text meets the given judging criteria, utilizing all available information, including conversations, documents, and tools.
### Criteria: A factually incorrect response occurs when the assistant's message contains one or more factual claims that are unsupported by, inconsistent with, or directly contradicted by the information provided in the documents or context. This includes situations where the assistant: introduces details not grounded in the context, misstates or distorts facts contained within the context, misinterprets the meaning or implications of the context, supplies erroneous or conflicting information relative to the context. Even if only a small portion of the response contains such inaccuracies, the overall message is considered factually incorrect.
### Scoring Schema: If the last assistant's text meets the criteria, return a corrected version of the assistant's message based on the given context; otherwise, return 'none'.
"""

context = context.add(Message("user", corrector_message))
result_json = call_intrinsic("factuality_correction", context, backend)
return result_json["correction"]
Loading
Loading