Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UnQovering dataset support (QA) #495

Open
dcecchini opened this issue Jun 5, 2023 · 5 comments
Open

Add UnQovering dataset support (QA) #495

dcecchini opened this issue Jun 5, 2023 · 5 comments
Assignees
Labels
⏭️ Next Release Issues or Request for the next release

Comments

@dcecchini
Copy link
Contributor

dcecchini commented Jun 5, 2023

Add this dataset for QA tests (bias).

Reference: https://github.com/allenai/unqover

@JulesBelveze JulesBelveze added the ⏭️ Next Release Issues or Request for the next release label Jul 17, 2023
@ArshaanNazir ArshaanNazir added v2.1.0 Issue or request to be done in v2.1.0 release and removed ⏭️ Next Release Issues or Request for the next release labels Jul 19, 2023
@alytarik
Copy link
Contributor

UnQover is a fairly unique dataset in the sense that it does not have "correct" labels and it uses the answers directly to check bias. Below is an example data sample. Our approach is to change some stuff in the input and then test the model, this samples and dataset is not very suitable for that. We can skip this for now or maybe you have some ideas @dcecchini
image

@dcecchini
Copy link
Contributor Author

Hi @alytarik, I think what is important on this dataset is more on the process they used to generate those questions. They already identified cases in which they can automatically generate the questions that have high probability of containing bias. The lists of adjectives, templates, etc are present on files:

image

They also have scripts to create and fill the templates. Checking their visualization demo, we can see nice examples that could be a new feature of LangTest -- not giving a score on specific test, but an analysis tool to help researchers visualize how the model behaves given some inputs that may contain bias. For example:

image

What do you think, @JulesBelveze ?

@JulesBelveze
Copy link
Contributor

I also think it would be a great addition.

I would argue that in the setting of "underspecified context" the model should not answer anything. @alytarik in the example you shared the model should actually produce something like "I don't know" or "I am lacking context" but it shouldn't answer "Alice" nor "Justin", right?

@dcecchini I also really like what's under the "Under-specified Question" section of the demo you shared. We could definitely let the user choose a bias category (say "ethnicity"), use the templates to generate samples, and compute a bias score. Basically, let the user perform exactly what their demo does. What do you think?

@alytarik I can give you a hand on how to design a solution to integrate to langtest

@JulesBelveze
Copy link
Contributor

@alytarik how is that going?

@alytarik
Copy link
Contributor

@JulesBelveze i was focused on #579 for a while. I will be working on this after i finish up its tests etc.

@ArshaanNazir ArshaanNazir added ⏭️ Next Release Issues or Request for the next release and removed v2.1.0 Issue or request to be done in v2.1.0 release labels Sep 4, 2023
@ArshaanNazir ArshaanNazir added v2.1.0 Issue or request to be done in v2.1.0 release and removed ⏭️ Next Release Issues or Request for the next release labels Sep 6, 2023
@chakravarthik27 chakravarthik27 added ⏭️ Next Release Issues or Request for the next release and removed v2.1.0 Issue or request to be done in v2.1.0 release labels Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⏭️ Next Release Issues or Request for the next release
Projects
None yet
Development

No branches or pull requests

5 participants