Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable, Hangya et al., 2018

Paper, Tags: #nlp

Contributions:

Test a simple method for domain adaptation of bilingual word embeddings. We evaluate those embeddings on 2 bilingual tasks involving 2 domains:
- cross-lingual twitter sentiment classification
- medical bilingual lexicon induction
We tailor a broadly applicable semi-supervised classification method from computer vision to these tasks.

We study 2 bilingual tasks that depend on bilingual word embeddings (BWE). BWE adaptation:

We adapt monolingual WE to the target domain for source and target languages by building them using both general and target-domain unlabeled data.
We use post-hoc mapping, a seed lexicon to transform the word embeddings of the 2 languages into the same vector space.

Our approach is simple and task independent.

BWEs trained on general domain texts usually result in lower performance when used in a system for a specific domain. Reasons:

Vocabularies of specific domains contain words that aren't used in the general case, names of medicines or diseases
The meaning of a word varies across domains.

Our method adapts general domain BWEs in a way that preserves the semantic knowledge from general domain data and leverages monolingual domain specific data to create domain-specific BWEs. Our domain-adaptation approach is applicable to any language-pair in which monolingual data is available.

Method

We first train MWEs in both languages, and then map those into the same space using post-hoc mapping. We train MWEs for both languages by concatenating monolingual out-of-domain and in-domain data.

out-of-domain data allows us to create accurate distributed representations of common vocabulary
in-domain data embeds domain specific words

We then map the two MWEs using a small seed lexicon to create the adapted BWEs.

Our delightfully simple task independent method to adapt BWEs to a specific domain uses unlabeled monolingual data only

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1807.1075.md

1807.1075.md

Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable, Hangya et al., 2018

Paper, Tags: #nlp

Method

Files

1807.1075.md

Latest commit

History

1807.1075.md

File metadata and controls

Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable, Hangya et al., 2018

Paper, Tags: #nlp

Method