You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently all detoxify models seem to not recognize emojis that are meant to be toxic/hateful in context or on their own (#26). While the Bert tokenizer returns the same output for different emojis, Roberta-based tokenizers seem to differentiate between different emoji inputs.
Some potential solutions:
replacement method (fast): use an emoji library (e.g. demoji) and replace current emojis with their text description (i.e. 🖕 -> 'middle finger'). While this would work in some cases (when emojis are used with their literal meaning), there will be some cases where the description wouldn't make the intended meaning clearer e.g. drugs or sexually-related emojis. We would also need to be careful with how/when we're using emojis as keywords (could check for key emojis first and then replace).
training method (slow): train models to recognise various emojis under different contexts, might also be something that emerges naturally by training on lots of data containing emojis. Might work with the common use cases, but work less well with lesser used emojis. Would not work with the Bert tokenizer.
hybrid method where we train with emoji descriptions directly and replace them at inference time
To dos:
investigate how well the replacement method works on a dataset like Hatemoji
finetune Detoxify with Hatemoji train set and compare
The text was updated successfully, but these errors were encountered:
Detoxify results on the Hatemoji test set.
Random chance probability is: 0.561473
Majority class (always guessing 1 in this case) guessing probability is: 0.675318
without emoji replacement
original
unbiased
multilingual
f1
0.365546
0.391728
0.643069
accuracy
0.462087
0.476081
0.60229
precision
0.89823
0.906977
0.816232
recall
0.229465
0.249812
0.53052
average_precision
0.726469
0.733189
0.750076
with emoji replacement
original
unbiased
multilingual
f1
0.432323
0.48832
0.727654
accuracy
0.499491
0.531807
0.66972
precision
0.923551
0.932059
0.821023
recall
0.282216
0.330821
0.653353
average_precision
0.745373
0.760254
0.770515
*Took the identity_hate scores for original and unbiased and the toxicity ones for multilingual since it's the only label it was trained on
Currently all detoxify models seem to not recognize emojis that are meant to be toxic/hateful in context or on their own (#26). While the Bert tokenizer returns the same output for different emojis, Roberta-based tokenizers seem to differentiate between different emoji inputs.
Some potential solutions:
To dos:
The text was updated successfully, but these errors were encountered: