Retrieve attention score for all input tokens per generated token

**Is your feature request related to a problem? Please describe.**
In RAG-scenarious, I think it would be a great help to differentiate if a LLM is hallucinating or retrieving its informations from the given context, when we could get an attention score for all input-tokens per generated token.

**Describe the solution you'd like**
Having a callback-mechanism for every generated token, similar to the LogitsProcessor, that receives a list of scores.

**Describe alternatives you've considered**
Calculating the scores by myself. But my knowledge of transformers is not sufficient.

**Additional context**
I would like to build something like the "Attention tracing" in [this](https://github.com/mattneary/attention) repository, but with llama.cpp as backend.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retrieve attention score for all input tokens per generated token #1141

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Retrieve attention score for all input tokens per generated token #1141

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions