Semi-supervised sequence tagging with bidirectional language models, Peters et al., 2017

Paper, Tags: #nlp, #embeddings

This paper demonstrates a general semi-supervised method for adding pre-trained context embeddings from bidirectional language models to augment token representations in sequence tagging models.

Sequence tagging: assign a categorical label to each member of a sequence of observed values. Current (2017) state-of-the-art ST models include a bidirectional RNN that encodes token sequences into a context sensitive representation. The parameters of this biRNN are learnt only on labeled data.

Our approach doesn't require additional labeled data. We use a neural language model (LM), pre-trained on a large, unlabeled corpus to compute an encoding of the context at each position in the sequence (LM embedding) and use it in the supervised sequence tagging model.

Contributions

The context sensitive representation captured in the LM embeddings is useful in the supervised sequence tagging setting.
Using both forward and backward LM embeddings boosts performance over a forward only LM.
Domain specific pre-training isn't necessary by applying a LM trained in the news domain to scientific papers

Language model augmented sequence taggers, TagLM

Pretrain word embeddings and language model on a large, unlabeled corpora
Prepare word embedding and LM embedding for each token in the input sequence
Use both word embeddings and LM embeddings in the sequence tagging model (model receives 2 inputs)

Inputs

Word embedding model
- Character representation captures morphological information and is either a CNN or a RNN
- Token embeddings, obtained as a lookup, initialized using pre-trained word embeddings
Recurrent language model / encoding of context: multiple layers of biRNNs

BiLM

Computes the probability of a token sequence. Pass a token representation (from a CNN over characters or as token embeddings) through multiple layers of LSTMs to embed the history into a fixed dimensional vector -> forward LM embedding, predicts the probability of token t_k+1. A backward LM embedding predicts the previous token given the future context.

Both embeddings are pre-trained separatedly, and then are shallowly concatenated to form the bidirectional LM embeddings.

Combining biLM with sequence model

TagLM uses the LM embeddings as additional inputs to the sequence tagging model.

Results

The LM embeddings amount to an average absolute improvement of 1.06 and 1.37 F₁
The improvements obtained by adding LM embeddings are larger than the improvements previously obtained by adding other forms of transfer or joint learning
Adding backward LM embeddings consistently outperforms forward-only LM embeddings
LM size is important, replacing by a bigger model improves F₁ as much as adding backward LM
LM embeddings can improve performance of a sequence tagger even when the data comes from a different domain

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1705.00108.md

1705.00108.md

Semi-supervised sequence tagging with bidirectional language models, Peters et al., 2017

Paper, Tags: #nlp, #embeddings

Contributions

Language model augmented sequence taggers, TagLM

Inputs

BiLM

Combining biLM with sequence model

Results

Files

1705.00108.md

Latest commit

History

1705.00108.md

File metadata and controls

Semi-supervised sequence tagging with bidirectional language models, Peters et al., 2017

Paper, Tags: #nlp, #embeddings

Contributions

Language model augmented sequence taggers, TagLM

Inputs

BiLM

Combining biLM with sequence model

Results