Skip to content

Commit 01517e8

Browse files
authored
Update README.md
1 parent 5c9ed6e commit 01517e8

File tree

1 file changed

+10
-4
lines changed

1 file changed

+10
-4
lines changed

README.md

+10-4
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
11
# [ToxiGen](http://arxiv.org/abs/2203.09509): A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection ![Github_Picture](https://user-images.githubusercontent.com/13631873/159418812-98ccfe19-1a63-4bc9-9692-92f096f443b6.png)
2+
3+
## [June 17, 2024] Update: Releasing 27,450 human annotations.
4+
You can now download the raw human annotations via `load_dataset("toxigen/toxigen-data", "annotations")`. These data include 27,450 responses from all mechanical turk annotations. All WorkerIDs have been hashed to further anonymize the annotators.
5+
6+
## Overview
27
This repository includes all necessary components that we used to generate ToxiGen dataset which contains implicitly toxic and benign sentences mentioning 13 minority groups. It includes a tool referred to as ALICE to stress test a given off-the-shelf content moderation system and iteratively improve it across these minority groups.
38

49
With release of the source codes and prompt seeds for this work we hope to encourage and engage community to contribute to it by for example adding prompt seeds and generating data for minority groups that are not covered in our dataset or even scenarios we have not covered to continuously iterate and improve it (e.g., by submitting PR to this repository).
@@ -13,14 +18,15 @@ This repository includes two methods for generating new sentences given a large
1318

1419
## Downloading ToxiGen
1520

16-
You can download ToxiGen using HuggingFace 🤗 from [this webpage](https://huggingface.co/datasets/skg/toxigen-data) or through python:
21+
ToxiGen is available on [HuggingFace](https://huggingface.co/datasets/toxigen/toxigen-data).
1722

18-
To run these commands you'll need to create a Hugging Face auth_token by following [these](https://huggingface.co/docs/hub/security-tokens) steps. As discussed below, you can manually use `use_auth_token={auth_token}` or register your token with your transformers installation via huggingface-cli.
23+
To download with python, you'll need to create a Hugging Face auth_token by following [these instructions](https://huggingface.co/docs/hub/security-tokens). As discussed below, you can manually use `use_auth_token={auth_token}` or register your token with your transformers installation via huggingface-cli.
1924

2025
```
2126
from datasets import load_dataset
22-
TG_data = load_dataset("skg/toxigen-data", name="train", use_auth_token=True) # 250k training examples
23-
TG_annotations = load_dataset("skg/toxigen-data", name="annotated", use_auth_token=True) # Human study
27+
train_data = load_dataset("toxigen/toxigen-data", name="train", use_auth_token=True) # 250k training examples
28+
annotated_data = load_dataset("toxigen/toxigen-data", name="annotated", use_auth_token=True) # Human study
29+
raw_annotations = load_dataset("toxigen/toxigen-data", name="annotations", use_auth_token=True) # Raw Human study
2430
```
2531

2632
**Optional, but helpful**: Please fill out [this form](https://forms.office.com/r/r6VXX8f8vh) so we can track how the community uses ToxiGen.

0 commit comments

Comments
 (0)