Description
System Info
Windows 11 22631.5335
transformers.js 3.5.2
React 19
python 3.12
transformers 4.52.4
optimum 1.17.1
onnxruntime 1.22.0
All dependencies pinned in linked sample project.
Environment/Platform
- Website/web-app
- Browser extension
- Server-side (e.g., Node.js, Deno, Bun)
- Desktop app (e.g., Electron)
- Other (e.g., VSCode extension)
Description
Set 1 embeddings: generated in browser using fp16, webgpu
Set 2 embeddings: generated in python using ORTModelForFeatureExtraction, file_name=model_fp16.onnx
When using the model https://huggingface.co/Xenova/all-MiniLM-L6-v2 it works as expected. Cosine similarity between embeddings generated in js and python is >0.99.
However when using the model https://huggingface.co/Xenova/e5-base-v2 the embeddings differ by a significant amount. Cosine similarity is ~0.75.
Since onnx models are used in both environments and it works for the more popular model, I felt like this might be a bug and not a quantization issue.
Reproduction
I have a sample project setup to demonstrate this: https://github.com/ashikns/embedding-compare.
README of project describes steps to run (just two barebone folders). It uses Xenova/e5-base-v2
, but also works with Xenova/all-MiniLM-L6-v2
.