Skip to content

Commit 1cc98ab

Browse files
committed
Merge commit 'f79072d8b1bdb0e5d298889387158cfd24d5955e' as 'MUStARD'
2 parents 3fe6331 + f79072d commit 1cc98ab

32 files changed

+81644
-0
lines changed

MUStARD/.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/output

MUStARD/LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2019 Multimodal Language Understanding Group (MLUG)
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

MUStARD/README.md

+119
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# MUStARD: Multimodal Sarcasm Detection Dataset
2+
3+
This repository contains the dataset and code for our ACL 2019 paper:
4+
5+
[Towards Multimodal Sarcasm Detection (An _Obviously_ Perfect Paper)](https://www.aclweb.org/anthology/P19-1455/)
6+
7+
We release the MUStARD dataset which is a multimodal video corpus for research in automated sarcasm discovery. The dataset
8+
is compiled from popular TV shows including *Friends*, *The Golden Girls*, *The Big Bang Theory*, and
9+
*Sarcasmaholics Anonymous*. MUStARD consists of audiovisual utterances annotated with sarcasm labels. Each utterance is
10+
accompanied by its context, which provides additional information on the scenario where the utterance occurs.
11+
12+
## Example Instance
13+
14+
![Example instance](images/utterance_example.jpg)
15+
16+
<p align="center"> Example sarcastic utterance from the dataset along with its context and transcript. </p>
17+
18+
## Raw Videos
19+
20+
We provide a [Google Drive folder with the raw video clips](https://drive.google.com/file/d/1i9ixalVcXskA5_BkNnbR60sqJqvGyi6E/view?usp=sharing),
21+
including both the utterances and their respective context
22+
23+
## Data Format
24+
25+
The annotations and transcripts of the audiovisual clips are available at [`data/sarcasm_data.json`](data/sarcasm_data.json).
26+
Each instance in the JSON file is allotted one identifier (e.g. "1\_60") which is a dictionary of the following items:
27+
28+
| Key | Value |
29+
| ----------------------- |:------------------------------------------------------------------------------:|
30+
| `utterance` | The text of the target utterance to classify. |
31+
| `speaker` | Speaker of the target utterance. |
32+
| `context` | List of utterances (in chronological order) preceding the target utterance. |
33+
| `context_speakers` | Respective speakers of the context utterances. |
34+
| `sarcasm` | Binary label for sarcasm tag. |
35+
36+
Example format in JSON:
37+
38+
```json
39+
{
40+
"1_60": {
41+
"utterance": "It's just a privilege to watch your mind at work.",
42+
"speaker": "SHELDON",
43+
"context": [
44+
"I never would have identified the fingerprints of string theory in the aftermath of the Big Bang.",
45+
"My apologies. What's your plan?"
46+
],
47+
"context_speakers": [
48+
"LEONARD",
49+
"SHELDON"
50+
],
51+
"sarcasm": true
52+
}
53+
}
54+
```
55+
56+
## Citation
57+
58+
Please cite the following paper if you find this dataset useful in your research:
59+
60+
```bibtex
61+
@inproceedings{mustard,
62+
title = "Towards Multimodal Sarcasm Detection (An \_Obviously\_ Perfect Paper)",
63+
author = "Castro, Santiago and
64+
Hazarika, Devamanyu and
65+
P{\'e}rez-Rosas, Ver{\'o}nica and
66+
Zimmermann, Roger and
67+
Mihalcea, Rada and
68+
Poria, Soujanya",
69+
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
70+
month = "7",
71+
year = "2019",
72+
address = "Florence, Italy",
73+
publisher = "Association for Computational Linguistics",
74+
}
75+
```
76+
77+
## Run the code
78+
79+
1. Setup an environment with Conda:
80+
81+
```bash
82+
conda env create -f environment.yml
83+
conda activate mustard
84+
python -c "import nltk; nltk.download('punkt')"
85+
```
86+
87+
2. Download [Common Crawl pretrained GloVe word vectors of size 300d, 840B tokens](http://nlp.stanford.edu/data/glove.840B.300d.zip)
88+
somewhere.
89+
90+
3. [Download the pre-extracted visual features](https://drive.google.com/open?id=1Ff1WDObGKqpfbvy7-H1mD8YWvBS-Kf26) to the `data/` folder (so `data/features/` contains the folders `context_final/` and `utterances_final/` with the features) or [extract the visual features](visual) yourself.
91+
92+
4. [Download the pre-extracted BERT features](https://drive.google.com/file/d/1GYv74vN80iX_IkEmkJhkjDRGxLvraWuZ/view?usp=sharing) and place the two files directly under the folder `data/` (so they are `data/bert-output.jsonl` and `data/bert-output-context.jsonl`), or [extract the BERT features in another environment with Python 2 and TensorFlow 1.11.0 following
93+
["Using BERT to extract fixed feature vectors (like ELMo)" from BERT's repo](https://github.com/google-research/bert/tree/d66a146741588fb208450bde15aa7db143baaa69#using-bert-to-extract-fixed-feature-vectors-like-elmo)
94+
and running:
95+
96+
```bash
97+
# Download BERT-base uncased in some dir:
98+
wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
99+
# Then put the location in this var:
100+
BERT_BASE_DIR=...
101+
102+
python extract_features.py \
103+
--input_file=data/bert-input.txt \
104+
--output_file=data/bert-output.jsonl \
105+
--vocab_file=${BERT_BASE_DIR}/vocab.txt \
106+
--bert_config_file=${BERT_BASE_DIR}/bert_config.json \
107+
--init_checkpoint=${BERT_BASE_DIR}/bert_model.ckpt \
108+
--layers=-1,-2,-3,-4 \
109+
--max_seq_length=128 \
110+
--batch_size=8
111+
```
112+
113+
5. Check the options in `python train_svm.py -h` to select a run configuration (or modify [`config.py`](config.py)) and then run it:
114+
115+
```bash
116+
python train_svm.py # add the flags you want
117+
```
118+
119+
6. Evaluation: We evaluate using weighted F-score metric in a 5-fold cross validation scheme. The fold indices are available at `data/split_incides.p` . Refer to our baseline scripts for more details.

MUStARD/config.py

+189
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
class Config:
2+
3+
model = "SVM"
4+
runs = 1 # No. of runs of experiments
5+
6+
# Training modes
7+
use_context = False # whether to use context information or not (default false)
8+
use_author = False # add author one-hot encoding in the input
9+
10+
use_bert = True # if False, uses glove pooling
11+
12+
use_target_text = False
13+
use_target_audio = False # adds audio target utterance features.
14+
use_target_video = False # adds video target utterance features.
15+
16+
speaker_independent = False # speaker independent experiments
17+
18+
embedding_dim = 300 # GloVe embedding size
19+
word_embedding_path = "/home/sacastro/glove.840B.300d.txt"
20+
max_sent_length = 20
21+
max_context_length = 4 # Maximum sentences to take in context
22+
num_classes = 2 # Binary classification of sarcasm
23+
epochs = 15
24+
batch_size = 16
25+
val_split = 0.1 # Percentage of data in validation set from training data
26+
27+
svm_c = 10.0
28+
svm_scale = True
29+
30+
31+
class SpeakerDependentTConfig(Config):
32+
use_target_text = True
33+
svm_c = 1.0
34+
35+
36+
class SpeakerDependentAConfig(Config):
37+
use_target_audio = True
38+
svm_c = 1.0
39+
40+
41+
class SpeakerDependentVConfig(Config):
42+
use_target_video = True
43+
svm_c = 1.0
44+
45+
46+
class SpeakerDependentTAConfig(Config):
47+
use_target_text = True
48+
use_target_audio = True
49+
svm_c = 1.0
50+
51+
52+
class SpeakerDependentTVConfig(Config):
53+
use_target_text = True
54+
use_target_video = True
55+
svm_c = 10.0
56+
57+
58+
class SpeakerDependentAVConfig(Config):
59+
use_target_audio = True
60+
use_target_video = True
61+
svm_c = 30.0
62+
63+
64+
class SpeakerDependentTAVConfig(Config):
65+
use_target_text = True
66+
use_target_audio = True
67+
use_target_video = True
68+
svm_c = 10.0
69+
70+
71+
class SpeakerDependentTPlusContext(SpeakerDependentTConfig):
72+
use_context = True
73+
svm_c = 1.0
74+
75+
76+
class SpeakerDependentTPlusAuthor(SpeakerDependentTConfig):
77+
use_author = True
78+
svm_c = 10.0
79+
80+
81+
class SpeakerDependentTVPlusContext(SpeakerDependentTVConfig):
82+
use_context = True
83+
svm_c = 10.0
84+
85+
86+
class SpeakerDependentTVPlusAuthor(SpeakerDependentTVConfig):
87+
use_author = True
88+
svm_c = 10.0
89+
90+
91+
class SpeakerIndependentTConfig(Config):
92+
svm_scale = False
93+
use_target_text = True
94+
svm_c = 10.0
95+
speaker_independent = True
96+
97+
98+
class SpeakerIndependentAConfig(Config):
99+
svm_scale = False
100+
use_target_audio = True
101+
svm_c = 1000.0
102+
speaker_independent = True
103+
104+
105+
class SpeakerIndependentVConfig(Config):
106+
svm_scale = False
107+
use_target_video = True
108+
svm_c = 30.0
109+
speaker_independent = True
110+
111+
112+
class SpeakerIndependentTAConfig(Config):
113+
svm_scale = False
114+
use_target_text = True
115+
use_target_audio = True
116+
svm_c = 500.0
117+
speaker_independent = True
118+
119+
120+
class SpeakerIndependentTVConfig(Config):
121+
svm_scale = False
122+
use_target_text = True
123+
use_target_video = True
124+
svm_c = 10.0
125+
speaker_independent = True
126+
127+
128+
class SpeakerIndependentAVConfig(Config):
129+
svm_scale = False
130+
use_target_audio = True
131+
use_target_video = True
132+
svm_c = 500.0
133+
speaker_independent = True
134+
135+
136+
class SpeakerIndependentTAVConfig(Config):
137+
svm_scale = False
138+
use_target_text = True
139+
use_target_audio = True
140+
use_target_video = True
141+
svm_c = 1000.0
142+
speaker_independent = True
143+
144+
145+
class SpeakerIndependentTPlusContext(SpeakerIndependentTConfig):
146+
use_context = True
147+
svm_c = 10.0
148+
149+
150+
class SpeakerIndependentTPlusAuthor(SpeakerIndependentTConfig):
151+
use_author = True
152+
svm_c = 10.0
153+
154+
155+
class SpeakerIndependentTAPlusContext(SpeakerIndependentTAConfig):
156+
use_context = True
157+
svm_c = 1000.0
158+
159+
160+
class SpeakerIndependentTAPlusAuthor(SpeakerIndependentTAConfig):
161+
use_author = True
162+
svm_c = 1000.0
163+
164+
165+
CONFIG_BY_KEY = {
166+
'': Config(),
167+
't': SpeakerDependentTConfig(),
168+
'a': SpeakerDependentAConfig(),
169+
'v': SpeakerDependentVConfig(),
170+
'ta': SpeakerDependentTAConfig(),
171+
'tv': SpeakerDependentTVConfig(),
172+
'av': SpeakerDependentAVConfig(),
173+
'tav': SpeakerDependentTAVConfig(),
174+
't-c': SpeakerDependentTPlusContext(),
175+
't-author': SpeakerDependentTPlusAuthor(),
176+
'tv-c': SpeakerDependentTVPlusContext(),
177+
'tv-author': SpeakerDependentTVPlusAuthor(),
178+
'i-t': SpeakerIndependentTConfig(),
179+
'i-a': SpeakerIndependentAConfig(),
180+
'i-v': SpeakerIndependentVConfig(),
181+
'i-ta': SpeakerIndependentTAConfig(),
182+
'i-tv': SpeakerIndependentTVConfig(),
183+
'i-av': SpeakerIndependentAVConfig(),
184+
'i-tav': SpeakerIndependentTAVConfig(),
185+
'i-t-c': SpeakerIndependentTPlusContext(),
186+
'i-t-author': SpeakerIndependentTPlusAuthor(),
187+
'i-ta-c': SpeakerIndependentTAPlusContext(),
188+
'i-ta-author': SpeakerIndependentTAPlusAuthor(),
189+
}

MUStARD/data/.gitignore

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
/features
2+
/frames
3+
/videos
4+
/bert-output*.jsonl

MUStARD/data/audio_features.p

17.7 MB
Binary file not shown.

0 commit comments

Comments
 (0)