Detoxify doesn't work well on Emojis #27

laurahanu · 2021-08-23T10:29:21Z

Currently all detoxify models seem to not recognize emojis that are meant to be toxic/hateful in context or on their own (#26). While the Bert tokenizer returns the same output for different emojis, Roberta-based tokenizers seem to differentiate between different emoji inputs.

Some potential solutions:

replacement method (fast): use an emoji library (e.g. demoji) and replace current emojis with their text description (i.e. 🖕 -> 'middle finger'). While this would work in some cases (when emojis are used with their literal meaning), there will be some cases where the description wouldn't make the intended meaning clearer e.g. drugs or sexually-related emojis. We would also need to be careful with how/when we're using emojis as keywords (could check for key emojis first and then replace).
training method (slow): train models to recognise various emojis under different contexts, might also be something that emerges naturally by training on lots of data containing emojis. Might work with the common use cases, but work less well with lesser used emojis. Would not work with the Bert tokenizer.
hybrid method where we train with emoji descriptions directly and replace them at inference time

To dos:

investigate how well the replacement method works on a dataset like Hatemoji
finetune Detoxify with Hatemoji train set and compare

laurahanu · 2021-08-23T13:26:49Z

Detoxify results on the Hatemoji test set.
Random chance probability is: 0.561473
Majority class (always guessing 1 in this case) guessing probability is: 0.675318

without emoji replacement

	original	unbiased	multilingual
f1	0.365546	0.391728	0.643069
accuracy	0.462087	0.476081	0.60229
precision	0.89823	0.906977	0.816232
recall	0.229465	0.249812	0.53052
average_precision	0.726469	0.733189	0.750076

with emoji replacement

	original	unbiased	multilingual
f1	0.432323	0.48832	0.727654
accuracy	0.499491	0.531807	0.66972
precision	0.923551	0.932059	0.821023
recall	0.282216	0.330821	0.653353
average_precision	0.745373	0.760254	0.770515

*Took the identity_hate scores for original and unbiased and the toxicity ones for multilingual since it's the only label it was trained on

laurahanu mentioned this issue Aug 23, 2021

Detoxify Roadmap #26

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detoxify doesn't work well on Emojis #27

Detoxify doesn't work well on Emojis #27

laurahanu commented Aug 23, 2021 •

edited

Loading

laurahanu commented Aug 23, 2021

Detoxify doesn't work well on Emojis #27

Detoxify doesn't work well on Emojis #27

Comments

laurahanu commented Aug 23, 2021 • edited Loading

laurahanu commented Aug 23, 2021

laurahanu commented Aug 23, 2021 •

edited

Loading