ConConCor

The Contentious Contexts Corpus dataset. This project was carried out in the context of the EuropeanaTech Challenge for Europeana Artificial Intelligence and Machine Learning datasets.

The dataset is made available through the CC-BY license

The dataset is supported with the [Project Documentation](Dataset/Project Documentation.pdf) and the Datasheet.

The dataset is split into 4 sub-sets to reduce repetition in the data (and therefore stored size), and improve clarity of the data for inspection.

Extracts.csv: 2720 Dutch newspaper articles extracts obtained from OCR'd versions of the Europeana Newspaper collection, as provided by KB National Library of the Netherlands

extract_id: H – expert annotators, c – control samples
target: a target word that was used in a query
target_compound: a target word found in an extract
target_compound_bolded: a bolded target word found in an extract (mathematical sans-serif bold italic small unicode charachters are used)
text: extract text of 5 sentences, centred around a bolded target word
url: a url to Delpher to view the newspaper scan and the OCR'd text

Annotations.csv: Anonymised participant multi-choice responses; in being asked to define whether the target word in the given textual context is contentious (to even the slightest degree), according to present-day sensibilities

anonymised_participant_id: 'unknown_' prefix – expert annotators, 0–398 – Prolific annotators
extract_id
response: the multiple-choice options for each extract “Omstreden naar huidige maatstaven” (“Contentious according to current standards”), “Niet omstreden" (“Not contentious”), “Weet ik niet” (“I don’t know”), “Onleesbare OCR” ("Illegible OCR”)
suggestion: a suggested word that an annotator found contentios in the given extract (can be empty)
is_control: boolean, True if an extract was used as a control one

Demographics.csv: Anonymised Prolific annotators demographic data, no demographic data was collected from the expert annotators

anonymised_participant_id
time_taken: sec
age
Country of Birth
Current Country of Residence
Employment Status
First Language
Fluent languages
Nationality
Sex
Student Status

Metadata.csv: metadata corresponding to the extracts in Extracts.csv. This metadata is extracted from the KB via the provided OAI-PMH protocol

url: same as in Extracts.csv
europeana_issue_id
datestamp
date
publisher
spatial_distribution
spatial_origin
spatial_origin

Additional files:

alpha_per_group.csv group: groups of annotators, group_1 – group_57 Prolific groups, group_58 – group_60 experts groups alpha: Krippendorff's alpha scores (annotators agreement) num_annotators: number of annotators in a group annotators_id: a list (str) of annotators' IDs in a group extracts_id: a list (str) of extracts IDs in a batch (or per group)

percentage_agreement.csv extract_id: same as in Extracts.csv target: same as in Extracts.csv omstreden: number of annotators in a group selected the option “Omstreden naar huidige maatstaven” niet_omstreden: number of annotators in a group selected the option “Niet omstreden” weet_ik_niet: number of annotators in a group selected the option “Weet ik niet” bad_ocr: number of annotators in a group selected the option “Onleesbare OCR” num_annotators: number of annotators in a group percentage_agreement: percentage agreement between annotators in a group per exrtract

Building the dataset

See here for instructions for recreating the dataset components: i.e., sampling extracts, auto-assembly of Google Forms, creation of the datasets files.

K-cap 2021 paper analyses

See here for instructions for performing the analyses/ creating the figures presented in K-Cap 2021 paper.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Dataset		Dataset
K-Cap_2021		K-Cap_2021
build_scripts		build_scripts
.gitignore		.gitignore
README.md		README.md
alpha_per_group.csv		alpha_per_group.csv
percentage_agreement.csv		percentage_agreement.csv
to_140_chars.py		to_140_chars.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConConCor

Building the dataset

K-cap 2021 paper analyses

About

Releases

Packages

Languages

cultural-ai/ConConCor

Folders and files

Latest commit

History

Repository files navigation

ConConCor

Building the dataset

K-cap 2021 paper analyses

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages