expo-sentence-embeddings

A native Expo module for generating sentence embeddings on-device using MiniLM. This module provides efficient text embedding generation for both Android ~~and iOS~~ platforms.

I'm still working on the swift implementation, but the Android implementation is working and tested on Android API 35.

Installation

# Clone the repository into your project's modules directory

git clone https://github.com/mgwilt/expo-sentence-embeddings.git modules/expo-sentence-embeddings

# Add the module to your package.json dependencies
cd modules/expo-sentence-embeddings
npm install

Make sure to add the module to your app.json/expo.config.js plugins array:

{
  "expo": {
    "plugins": [
      ["./modules/expo-sentence-embeddings"]
    ]
  }
}

Setup

Download Model

Before using the module, you need to download the required model files. The process involves setting up a Python virtual environment and running the provided script.

Windows

cd modules/expo-sentence-embeddings/scripts
python -m venv venv
.\venv\Scripts\Activate.ps1
pip install -r requirements.txt
python download_model.py

macOS/Linux

cd modules/expo-sentence-embeddings/scripts
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python download_model.py

This will download the necessary model files and tokenizer to the appropriate locations in the Android ~~and iOS~~ directories.

Usage

import * as SentenceEmbeddings from 'expo-sentence-embeddings';

// Optional: Configure the module
await SentenceEmbeddings.configure({
  maxLength: 256, // Maximum sequence length (default: 256)
  normalize: true // Whether to L2-normalize embeddings (default: true)
});

// Generate embeddings for a single sentence
const text = "Your input text here";
const embeddings = await SentenceEmbeddings.encode(text);

// Generate embeddings for multiple sentences
const texts = ["First sentence", "Second sentence"];
const batchEmbeddings = await SentenceEmbeddings.encode(texts);

API Reference

`configure(options: ConfigureOptions): Promise<void>`

Configures the module settings.

Parameters:
- options: Configuration object with the following properties:
  - maxLength: Maximum sequence length (default: 256)
  - normalize: Whether to L2-normalize embeddings (default: true)

`encode(input: string | string[]): Promise<number[] | number[][]>`

Generates embeddings for text input(s).

Parameters:
- input: A single text string or array of text strings
Returns: A promise that resolves to:
- For single text: An array of numbers representing the embedding vector
- For multiple texts: An array of embedding vectors

Technical Details

This module uses the all-MiniLM-L6-v2 model to generate sentence embeddings. The embeddings are 384-dimensional vectors that capture semantic meaning of the input text. The model runs entirely on-device, ensuring privacy and offline functionality.

The module includes basic word tokenization with special tokens ([CLS], [SEP], [UNK], [PAD], [MASK]) and handles unknown words by using the [UNK] token. Future versions may implement proper WordPiece tokenization for better handling of out-of-vocabulary words.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
android		android
ios		ios
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
expo-module.config.json		expo-module.config.json
index.ts		index.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

expo-sentence-embeddings

Installation

Setup

Download Model

Windows

macOS/Linux

Usage

API Reference

`configure(options: ConfigureOptions): Promise<void>`

`encode(input: string | string[]): Promise<number[] | number[][]>`

Technical Details

License

About

Releases

Packages

Languages

License

mgwilt/expo-sentence-embeddings

Folders and files

Latest commit

History

Repository files navigation

expo-sentence-embeddings

Installation

Setup

Download Model

Windows

macOS/Linux

Usage

API Reference

configure(options: ConfigureOptions): Promise<void>

encode(input: string | string[]): Promise<number[] | number[][]>

Technical Details

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`configure(options: ConfigureOptions): Promise<void>`

`encode(input: string | string[]): Promise<number[] | number[][]>`

Packages