Skip to content

Retrieve attention score for all input tokens per generated token #1141

@parallaxe

Description

@parallaxe

Is your feature request related to a problem? Please describe.
In RAG-scenarious, I think it would be a great help to differentiate if a LLM is hallucinating or retrieving its informations from the given context, when we could get an attention score for all input-tokens per generated token.

Describe the solution you'd like
Having a callback-mechanism for every generated token, similar to the LogitsProcessor, that receives a list of scores.

Describe alternatives you've considered
Calculating the scores by myself. But my knowledge of transformers is not sufficient.

Additional context
I would like to build something like the "Attention tracing" in this repository, but with llama.cpp as backend.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions