Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: fix semantic similarity description (cross-encoder -> bi-encoder) #1910

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Ayaka-mogumogu
Copy link

This PR updates the documentation to correctly describe the Semantic similarity.

Issue

The documentation previously stated that a cross-encoder was used for computing the semantic similarity score. However, after reviewing the implementation, it is clear that the current approach follows a bi-encoder strategy:

  • The ground truth and response are encoded independently
  • Their embeddings are then compared using cosine similarity

A cross-encoder would typically process both texts together in a single forward pass (e.g., concatenating them before encoding), which is not the case in the current implementation.

Current Implementation

For example, in the current implementation:

embedding_1 = np.array(await self.embeddings.embed_text(ground_truth))
embedding_2 = np.array(await self.embeddings.embed_text(answer))
# Normalization factors of the above embeddings
norms_1 = np.linalg.norm(embedding_1, keepdims=True)
norms_2 = np.linalg.norm(embedding_2, keepdims=True)
embedding_1_normalized = embedding_1 / norms_1
embedding_2_normalized = embedding_2 / norms_2
similarity = embedding_1_normalized @ embedding_2_normalized.T
score = similarity.flatten()

This code shows that the ground truth and response are encoded separately, and their similarity is computed using cosine similarity, which is characteristic of a bi-encoder approach.

Fix

The term "cross-encoder" has been corrected to "bi-encoder" in the documentation to ensure consistency with the actual implementation.

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Feb 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant