Explore BYOD library #560

dcecchini · 2023-06-26T20:30:20Z

Explode the BYOD repository for additional tests or datasets to add to nlptest.

Examples:

Should we add a self-evaluator test?
Should we add a dictionary based toxicity model (search for English or other language predefined terms that indicate toxicity)?
Anything else?

JulesBelveze · 2023-06-27T11:07:05Z

My personal takeaways:

Really interesting to use a similarity metric and an invariance score (similar to what we talked about @dcecchini @ArshaanNazir )
The tests they set up are quite childish (e.g. for 'word ordering' they simply swap two random words of the text)
The toxic approach and the metric they use is interesting: regardless the input of the LLM (containing toxic words or not) you don't want the model to output any toxic word
We could add a "broken tokenization" test
Really like their "radar" chart

Even though there's nothing ground breaking in the repo and paper I do think it is really interesting to have an approach in which the model is evaluated against itself.

dcecchini · 2023-06-27T11:44:22Z

I agree, some of the tests are very simple, but also easy to implement and fast to run. So maybe we could add like the toxicity one for a quick test without any dependency to an external library to run an ML model...

Let's make a list of what is worth to bring to nlptest and add them to the roadmap.

dcecchini · 2023-07-05T10:59:18Z

I just found a paper about self evaluating, would interesting to read and check if we can implement it.

https://arxiv.org/abs/2306.13651?utm_source=substack&utm_medium=email

dcecchini added the ⏭️ Next Release Issues or Request for the next release label Aug 7, 2023

ArshaanNazir assigned alytarik, RakshitKhajuria and Prikshit7766 and unassigned alytarik Sep 5, 2023

ArshaanNazir added v2.1.0 Issue or request to be done in v2.1.0 release and removed ⏭️ Next Release Issues or Request for the next release labels Sep 6, 2023

This was referenced Sep 11, 2023

Feature/ Negation-Sensitivity #759

Closed

feature/Sensitivity-Test #760

Merged

Prikshit7766 linked a pull request Sep 12, 2023 that will close this issue

feature/Sensitivity-Test #760

Merged

5 tasks

ArshaanNazir removed a link to a pull request Sep 18, 2023

feature/Sensitivity-Test #760

Merged

5 tasks

ArshaanNazir removed the v2.1.0 Issue or request to be done in v2.1.0 release label Sep 21, 2023

RakshitKhajuria mentioned this issue Sep 21, 2023

Add BYOD toxicity test #786

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore BYOD library #560

Explore BYOD library #560

dcecchini commented Jun 26, 2023 •

edited

Loading

JulesBelveze commented Jun 27, 2023

dcecchini commented Jun 27, 2023

dcecchini commented Jul 5, 2023

Explore BYOD library #560

Explore BYOD library #560

Comments

dcecchini commented Jun 26, 2023 • edited Loading

JulesBelveze commented Jun 27, 2023

dcecchini commented Jun 27, 2023

dcecchini commented Jul 5, 2023

dcecchini commented Jun 26, 2023 •

edited

Loading