General Sentence Embeddings

Extract Sentence Embeddings from Hugging Face pre-trained models.

This repo contains code for both tensorflow and pytorch. We can extract sentence embeddings for our dataset using any pre-trained Hugging Face models. Sometimes out of the box embeddings work or sometimes they won't. If you want to train/finetune on your own dataset, checkout sentence-transformers.

These can be used for any semantic similarity search tasks, clustering etc.

Dependencies

tensorflow 2.0.0
pytorch 1.6.0
transformers 3.0.2

Working

The code works in the following way

Load model and its respective tokenizer.
Tokenize our sentences
Get token embeddings
Convert token embeddings to single sentence embeddings^[1].

[1]. There are many techniques to convert token embeddings to sentence embeddings, but SOTA is mean pooling.

Benchmarks

Benchmarks using SentEval are coming Soon.

Other repos for Sentence Embeddings

Note

This repo is inspired by sentence-transformers. The pytorch code is from their repo.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
hf_emb_pytorch.py		hf_emb_pytorch.py
hf_emb_tensorflow.py		hf_emb_tensorflow.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

General Sentence Embeddings

Dependencies

Working

Benchmarks

Other repos for Sentence Embeddings

Note

About

Languages

License

najafmurtaza/General_Sentence_Embeddings

Folders and files

Latest commit

History

Repository files navigation

General Sentence Embeddings

Dependencies

Working

Benchmarks

Other repos for Sentence Embeddings

Note

About

Topics

Resources

License

Stars

Watchers

Forks

Languages