Hallucination | benchmarking on RAG truth dataset

I tried running both the models (ZS and Conv) on RAG Truth datasets (https://github.com/ParticleMedia/RAGTruth)
The steps I did was filtered the RAGTruth Dataset on summary tasks.
And fed them in the models.
`model_zs = SummaCZS(granularity="sentence", model_name="vitc", device="cuda") # If you have a GPU: switch to: device="cuda"
model_conv = SummaCConv(models=["vitc"], bins='percentile', granularity="sentence", nli_labels="e", device="cuda", start_file="default", agg="mean")
`
I considered the data in RAG Truth dataset as hallucinated if it had labels reported against it. Then I converted the binary hallucination score to 1 - hallucination score to get the true label for testing against consistency score reported by the model.

Later on I used the util code to choose the best threshold, e.g:
`best_thresholds_conv = choose_best_threshold(result_df['label'], result_df['conv_pred_score'])`

I am getting F1 score of around 0.6 on this dataset. Will paste the exact results as comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hallucination | benchmarking on RAG truth dataset #22

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Hallucination | benchmarking on RAG truth dataset #22

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions