Paper • HuggingFace • Video • Citation
Authors: Rose E. Wang, Pawan Wirawarn, Omar Khattab, Noah Goodman, Dorottya Demszky
Findings of EACL, Long Paper, 2024.
Many online content portals allow users to ask questions to supplement their understanding (e.g., of lectures or news articles). While information retrieval (IR) systems may provide answers for such user queries, they do not directly assist content creators identify segments that caused a user to ask those questions; this can be useful for several purposes like helping improve their content. We introduce the task of backtracing, in which systems retrieve the text segment that most likely provoked a user query.
In this repository, you will find:
- The first benchmark for backtracing, composed of three heterogeneous datasets and causal retrieval tasks: understanding the cause of (a) student confusion in the Lecture domain, (b) reader curiosity in the News Article domain, and (c) user emotion in the Conversation domain.
- Evaluations of a suite of retrieval systems on backtracing, including: BM25, bi-encoder methods, cross-encoder methods, re-ranker methods, gpt-3.5-turbo-16k, and several likelihood-based methods that use pre-trained language models to estimate the probability of the query conditioned on variations of the corpus.
Our results reveal several limitations of these methods; for example, bi-encoder methods struggle when the query and target segment share limited similarity and likelihood-based methods struggle with modeling what may be unknown information to a user. Overall, these results suggest that Backtracing is a challenging task that requires new retrieval approaches.
We hope our benchmark serves to evaluate and improve future retrieval systems for Backtracing, and ultimately, spawns systems that empower content creators to understand user queries, refine their content and provide users with better experiences.
If you find our work useful or interesting, please consider citing it!
@inproceedings{wang2024backtracing,
title = {Backtracing: Retrieving the Cause of the Query},
booktitle = {Findings of the Association for Computational Linguistics: EACL 2024},
publisher = {Association for Computational Linguistics},
year = {2024},
author = {Wang, Rose E. and Wirawarn, Pawan and Khattab, Omar and Goodman, Noah and Demszky, Dorottya},
}
We ran our experiments with Python 3.11 and on A6000 machines. To reproduce the results in our work, please run the following commands:
>> conda create -n backtracing python=3.11
>> conda activate backtracing
>> pip install -r requirements.txt # install all of our requirements
>> source run_table_evaluations.sh
run_table_evaluations.sh
outputs text files under results/<dataset>/
. The text files contain the results reported in Table 2 and 3. Here is an example of what the result should look like:
>> cat results/sight/semantic_similarity.txt
Query dirs: ['data/sight/query/annotated']
Source dirs: ['data/sight/sources/annotated']
Output fname: results/sight/annotated/semantic_similarity.csv
Output fname: results/sight/annotated/semantic_similarity.csv
Accuracy top 1: 0.23
Min distance top 1: 91.85
Accuracy top 3: 0.37
Min distance top 3: 35.22
The datasets are located under the data
directory.
Each dataset contains the query
directory (e.g., student question) and the sources
directory (e.g., the lecture transcript sentences).
└── data # Backtracing Datasets
└── sight # Lecture Domain, SIGHT, derived from https://github.com/rosewang2008/sight
├── query
└── sources
└── inquisitive # News Article Domain, Inquisitive, derived from https://github.com/wjko2/INQUISITIVE
├── query
└── sources
└── reccon # Conversation Domain, RECCON, derived from https://github.com/declare-lab/RECCON
├── query
└── sources
The section above uses the cached scores. If you want to run the retrieval from scratch, then run:
>> export OPENAI_API_KEY='yourkey' # if you want to run the gpt-3.5-turbo-16k results as well. otherwise skip.
>> source run_inference.sh