Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detoxify doesn't work well on Emojis #27

Open
1 of 2 tasks
laurahanu opened this issue Aug 23, 2021 · 1 comment
Open
1 of 2 tasks

Detoxify doesn't work well on Emojis #27

laurahanu opened this issue Aug 23, 2021 · 1 comment

Comments

@laurahanu
Copy link
Collaborator

laurahanu commented Aug 23, 2021

Currently all detoxify models seem to not recognize emojis that are meant to be toxic/hateful in context or on their own (#26). While the Bert tokenizer returns the same output for different emojis, Roberta-based tokenizers seem to differentiate between different emoji inputs.

Some potential solutions:

  • replacement method (fast): use an emoji library (e.g. demoji) and replace current emojis with their text description (i.e. 🖕 -> 'middle finger'). While this would work in some cases (when emojis are used with their literal meaning), there will be some cases where the description wouldn't make the intended meaning clearer e.g. drugs or sexually-related emojis. We would also need to be careful with how/when we're using emojis as keywords (could check for key emojis first and then replace).
  • training method (slow): train models to recognise various emojis under different contexts, might also be something that emerges naturally by training on lots of data containing emojis. Might work with the common use cases, but work less well with lesser used emojis. Would not work with the Bert tokenizer.
  • hybrid method where we train with emoji descriptions directly and replace them at inference time

To dos:

  • investigate how well the replacement method works on a dataset like Hatemoji
  • finetune Detoxify with Hatemoji train set and compare
@laurahanu laurahanu mentioned this issue Aug 23, 2021
7 tasks
@laurahanu
Copy link
Collaborator Author

Detoxify results on the Hatemoji test set.
Random chance probability is: 0.561473
Majority class (always guessing 1 in this case) guessing probability is: 0.675318

  • without emoji replacement
original unbiased multilingual
f1 0.365546 0.391728 0.643069
accuracy 0.462087 0.476081 0.60229
precision 0.89823 0.906977 0.816232
recall 0.229465 0.249812 0.53052
average_precision 0.726469 0.733189 0.750076
  • with emoji replacement
original unbiased multilingual
f1 0.432323 0.48832 0.727654
accuracy 0.499491 0.531807 0.66972
precision 0.923551 0.932059 0.821023
recall 0.282216 0.330821 0.653353
average_precision 0.745373 0.760254 0.770515

*Took the identity_hate scores for original and unbiased and the toxicity ones for multilingual since it's the only label it was trained on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant