Conditional Masking in BERT #80

vdpappu · 2019-07-03T05:15:10Z

BERT by construction masks 15% of the tokens in the sentence randomly during LM training. Making this conditional by masking only selected set of POS tags - (key-phrase candidate tokens) can help with faster convergence and help in retaining the general language understanding of BERT pre-trained model. This, in theory, should help us in:

Getting better representations for domain vocabulary
Ability to make the following differentiation:
- "Deploying word embedding model to production" is closely related to "We are using RNNs for better word representations" than "Deploying Janus Gateway to production"
Stable Language Model as more emphasis is on key-phrases tokens

vdpappu assigned vdpappu and reaganrewop Jul 3, 2019

saibaggins unassigned vdpappu and reaganrewop Apr 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conditional Masking in BERT #80

Conditional Masking in BERT #80

vdpappu commented Jul 3, 2019

Conditional Masking in BERT #80

Conditional Masking in BERT #80

Comments

vdpappu commented Jul 3, 2019