GPT training for Paragraph embeddings #49

reaganrewop · 2019-05-22T17:28:34Z

Test GPT2 feature representation linearity and it's scalability, for paragraph vectors.

ArjunKini · 2019-07-23T05:44:45Z

Will be extending GPT to get paragraph embeddings by using a LSTM-based "head" trained offline on sentence features extracted from GPT.

ArjunKini · 2019-08-07T10:20:36Z

I tested GPT and BERT for possible paragraph embedding applications. It was found that BERT gave a narrow range of scores, in the range of 0.7-0.99 across out-of-domain and in-domain topics as opposed to 0.2-0.9 for GPT, possibly due to aggregation of tokens to get the pooled feature representation of a sentence.
Moreover, BERT-paragraph embeddings formed by aggregating sentence level features is sensitive to noise(appending an out-of-domain sentence to the end of an in-domain paragraph reduces the score drastically). On the other hand, GPT is more resilient to the added noise.
Adding an LSTM-head to BERT did not alleviate these problems.

Conclusion: GPT-based paragraph embeddings are more stable than BERT-based ones.

ArjunKini · 2019-08-07T10:46:21Z

Conclusion2: GPT paragraph embeddings show good topic separation and can be used for separating segments based on context. In order to not rely on aggregation of sentence features, a Bi-LSTM head was used to aggregate the features instead of summing up the sentence-level feature vectors. This resulted in better context-capture across a paragraph.

reaganrewop self-assigned this May 22, 2019

reaganrewop added experiment poc Prototyping new approach labels May 22, 2019

vdpappu added the priority label May 27, 2019

vdpappu changed the title ~~GPT2 training on ether data.~~ GPT training on ether data. Jun 17, 2019

vdpappu assigned master10 and unassigned reaganrewop Jun 17, 2019

vdpappu mentioned this issue Jun 17, 2019

Paragraph embeddings based on weighted sentence embeddings #41

Closed

vdpappu assigned ArjunKini Jul 15, 2019

vdpappu changed the title ~~GPT training on ether data.~~ GPT training for Paragraph embeddings Jul 29, 2019

saibaggins unassigned master10 and ArjunKini Apr 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT training for Paragraph embeddings #49

GPT training for Paragraph embeddings #49

reaganrewop commented May 22, 2019

ArjunKini commented Jul 23, 2019

ArjunKini commented Aug 7, 2019

ArjunKini commented Aug 7, 2019

GPT training for Paragraph embeddings #49

GPT training for Paragraph embeddings #49

Comments

reaganrewop commented May 22, 2019

ArjunKini commented Jul 23, 2019

ArjunKini commented Aug 7, 2019

ArjunKini commented Aug 7, 2019