Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore BYOD library #560

Open
dcecchini opened this issue Jun 26, 2023 · 3 comments
Open

Explore BYOD library #560

dcecchini opened this issue Jun 26, 2023 · 3 comments
Assignees

Comments

@dcecchini
Copy link
Contributor

dcecchini commented Jun 26, 2023

Explode the BYOD repository for additional tests or datasets to add to nlptest.

Examples:

  • Should we add a self-evaluator test?
  • Should we add a dictionary based toxicity model (search for English or other language predefined terms that indicate toxicity)?
  • Anything else?
@JulesBelveze
Copy link
Contributor

My personal takeaways:

  • Really interesting to use a similarity metric and an invariance score (similar to what we talked about @dcecchini @ArshaanNazir )
  • The tests they set up are quite childish (e.g. for 'word ordering' they simply swap two random words of the text)
  • The toxic approach and the metric they use is interesting: regardless the input of the LLM (containing toxic words or not) you don't want the model to output any toxic word
  • We could add a "broken tokenization" test
  • Really like their "radar" chart

Even though there's nothing ground breaking in the repo and paper I do think it is really interesting to have an approach in which the model is evaluated against itself.

@dcecchini
Copy link
Contributor Author

I agree, some of the tests are very simple, but also easy to implement and fast to run. So maybe we could add like the toxicity one for a quick test without any dependency to an external library to run an ML model...

Let's make a list of what is worth to bring to nlptest and add them to the roadmap.

@dcecchini
Copy link
Contributor Author

I just found a paper about self evaluating, would interesting to read and check if we can implement it.

https://arxiv.org/abs/2306.13651?utm_source=substack&utm_medium=email

@dcecchini dcecchini added the ⏭️ Next Release Issues or Request for the next release label Aug 7, 2023
@ArshaanNazir ArshaanNazir added v2.1.0 Issue or request to be done in v2.1.0 release and removed ⏭️ Next Release Issues or Request for the next release labels Sep 6, 2023
This was referenced Sep 11, 2023
@Prikshit7766 Prikshit7766 linked a pull request Sep 12, 2023 that will close this issue
5 tasks
@ArshaanNazir ArshaanNazir removed a link to a pull request Sep 18, 2023
5 tasks
@ArshaanNazir ArshaanNazir removed the v2.1.0 Issue or request to be done in v2.1.0 release label Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants