DeTexD: A Benchmark Dataset for Delicate Text Detection

This is the official repository for DeTexD paper. Here you can find scripts used in the paper to evaluate models.

See also: DeTexD dataset, detexd-roberta-base model.

Install

pip install -r requirements.txt

Usage

Run evaluate_detexd_roberta.py to get the published model (grammarly/detexd-roberta-base) results on published dataset (grammarly/detexd-benchmark).

Run founta_basile_comparison.ipynb to reproduce results for models comparison from the paper. Note that you need to acquire the datsets because they have separate licences.

Run country_bias.ipynb to reproduce country bias analysis.

Run compare_hatebert.ipynb to reproduce hatebert models comparison.

Citation Information

@inproceedings{chernodub-etal-2023-detexd,
    title = "{D}e{T}ex{D}: A Benchmark Dataset for Delicate Text Detection",
    author = "Yavnyi, Serhii and Sliusarenko, Oleksii  and Razzaghi, Jade  and Mo, Yichen  and Hovakimyan, Knar and Chernodub, Artem",
    booktitle = "The 7th Workshop on Online Abuse and Harms (WOAH)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.woah-1.2",
    pages = "14--28",
    abstract = "Over the past few years, much research has been conducted to identify and regulate toxic language. However, few studies have addressed a broader range of sensitive texts that are not necessarily overtly toxic. In this paper, we introduce and define a new category of sensitive text called {``}delicate text.{''} We provide the taxonomy of delicate text and present a detailed annotation scheme. We annotate DeTexD, the first benchmark dataset for delicate text detection. The significance of the difference in the definitions is highlighted by the relative performance deltas between models trained each definitions and corpora and evaluated on the other. We make publicly available the DeTexD Benchmark dataset, annotation guidelines, and baseline model for delicate text detection.",
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compare-vs-openai-content-filter.ipynb		compare-vs-openai-content-filter.ipynb
compare-vs-openai.ipynb		compare-vs-openai.ipynb
compare-vs-zampieri-davidson.ipynb		compare-vs-zampieri-davidson.ipynb
compare_hatebert.ipynb		compare_hatebert.ipynb
compare_perspective_api.ipynb		compare_perspective_api.ipynb
counterfactual_examples.json		counterfactual_examples.json
country_bias.ipynb		country_bias.ipynb
detexd-bias-analysis.ipynb		detexd-bias-analysis.ipynb
evaluate_detexd_roberta.py		evaluate_detexd_roberta.py
founta_basile_comparison.ipynb		founta_basile_comparison.ipynb
olid_data.csv		olid_data.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeTexD: A Benchmark Dataset for Delicate Text Detection

Install

Usage

Citation Information

About

Releases

Packages

Contributors 3

Languages

License

grammarly/detexd

Folders and files

Latest commit

History

Repository files navigation

DeTexD: A Benchmark Dataset for Delicate Text Detection

Install

Usage

Citation Information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages