General Sentence Embeddings

Extract Sentence Embeddings from Hugging Face pre-trained models.

This repo contains code for both tensorflow and pytorch. We can extract sentence embeddings for our dataset using any pre-trained Hugging Face models. Sometimes out of the box embeddings work or sometimes they won't. If you want to train/finetune on your own dataset, checkout sentence-transformers.

These can be used for any semantic similarity search tasks, clustering etc.

Dependencies

tensorflow 2.0.0
pytorch 1.6.0
transformers 3.0.2

Working

The code works in the following way

Load model and its respective tokenizer.
Tokenize our sentences
Get token embeddings
Convert token embeddings to single sentence embeddings^[1].

[1]. There are many techniques to convert token embeddings to sentence embeddings, but SOTA is mean pooling.

Benchmarks

Benchmarks using SentEval are coming Soon.

Other repos for Sentence Embeddings

Gesen
sentence-transformers
InferSent
Skip-Thought
SBert
Universal Sentence Encoder
Flair
AdaptNLP

Note

This repo is inspired by sentence-transformers. The pytorch code is from their repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

General Sentence Embeddings

Dependencies

Working

Benchmarks

Other repos for Sentence Embeddings

Note

Files

README.md

Latest commit

History

README.md

File metadata and controls

General Sentence Embeddings

Dependencies

Working

Benchmarks

Other repos for Sentence Embeddings

Note