docs: fix semantic similarity description (cross-encoder -> bi-encoder) #1910

Ayaka-mogumogu · 2025-02-09T23:20:31Z

This PR updates the documentation to correctly describe the Semantic similarity.

Issue

The documentation previously stated that a cross-encoder was used for computing the semantic similarity score. However, after reviewing the implementation, it is clear that the current approach follows a bi-encoder strategy:

The ground truth and response are encoded independently
Their embeddings are then compared using cosine similarity

A cross-encoder would typically process both texts together in a single forward pass (e.g., concatenating them before encoding), which is not the case in the current implementation.

Current Implementation

For example, in the current implementation:

embedding_1 = np.array(await self.embeddings.embed_text(ground_truth))
embedding_2 = np.array(await self.embeddings.embed_text(answer))
# Normalization factors of the above embeddings
norms_1 = np.linalg.norm(embedding_1, keepdims=True)
norms_2 = np.linalg.norm(embedding_2, keepdims=True)
embedding_1_normalized = embedding_1 / norms_1
embedding_2_normalized = embedding_2 / norms_2
similarity = embedding_1_normalized @ embedding_2_normalized.T
score = similarity.flatten()

This code shows that the ground truth and response are encoded separately, and their similarity is computed using cosine similarity, which is characteristic of a bi-encoder approach.

Fix

The term "cross-encoder" has been corrected to "bi-encoder" in the documentation to ensure consistency with the actual implementation.

shahules786

Nice catch, thank you. We changed from using cross-encoder to biencoder but forgot to update the docs!

docs: fix semantic similarity description (cross-encoder -> bi-encoder)

1714f33

dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Feb 9, 2025

sahusiddharth requested a review from shahules786 February 10, 2025 04:06

shahules786 approved these changes Feb 14, 2025

View reviewed changes

shahules786 merged commit dcfd58b into explodinggradients:main Feb 14, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: fix semantic similarity description (cross-encoder -> bi-encoder) #1910

docs: fix semantic similarity description (cross-encoder -> bi-encoder) #1910

Uh oh!

Ayaka-mogumogu commented Feb 9, 2025

Uh oh!

shahules786 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

docs: fix semantic similarity description (cross-encoder -> bi-encoder) #1910

docs: fix semantic similarity description (cross-encoder -> bi-encoder) #1910

Uh oh!

Conversation

Ayaka-mogumogu commented Feb 9, 2025

Issue

Current Implementation

Fix

Uh oh!

shahules786 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants