-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UnQovering dataset support (QA) #495
Comments
UnQover is a fairly unique dataset in the sense that it does not have "correct" labels and it uses the answers directly to check bias. Below is an example data sample. Our approach is to change some stuff in the input and then test the model, this samples and dataset is not very suitable for that. We can skip this for now or maybe you have some ideas @dcecchini |
Hi @alytarik, I think what is important on this dataset is more on the process they used to generate those questions. They already identified cases in which they can automatically generate the questions that have high probability of containing bias. The lists of adjectives, templates, etc are present on files: ![]() They also have scripts to create and fill the templates. Checking their visualization demo, we can see nice examples that could be a new feature of LangTest -- not giving a score on specific test, but an analysis tool to help researchers visualize how the model behaves given some inputs that may contain bias. For example: ![]() What do you think, @JulesBelveze ? |
I also think it would be a great addition. I would argue that in the setting of "underspecified context" the model should not answer anything. @alytarik in the example you shared the model should actually produce something like "I don't know" or "I am lacking context" but it shouldn't answer "Alice" nor "Justin", right? @dcecchini I also really like what's under the "Under-specified Question" section of the demo you shared. We could definitely let the user choose a bias category (say "ethnicity"), use the templates to generate samples, and compute a bias score. Basically, let the user perform exactly what their demo does. What do you think? @alytarik I can give you a hand on how to design a solution to integrate to |
@alytarik how is that going? |
@JulesBelveze i was focused on #579 for a while. I will be working on this after i finish up its tests etc. |
Add this dataset for QA tests (bias).
Reference: https://github.com/allenai/unqover
The text was updated successfully, but these errors were encountered: