[docs] Use modality-neutral terms (input, document) in loss docs & docstrings#3772
Merged
tomaarsen merged 3 commits intoMay 12, 2026
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Updates documentation across SentenceTransformer, CrossEncoder, and SparseEncoder loss docs/docstrings to use modality-neutral terminology now that models can be multimodal.
Changes:
- Replaces text-centric terms (e.g., “sentence(s)”, “passage(s)”, “Texts”) with modality-neutral terms (“input(s)”, “document(s)”, “Inputs”) in loss overviews and loss docstrings.
- Updates example input-format tables (Markdown + RST grid tables) to reflect the new terminology.
- Aligns
mine_hard_negativesdocstring output-format terminology with “document”/“inputs”.
Reviewed changes
Copilot reviewed 54 out of 54 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| sentence_transformers/util/hard_negatives.py | Updates hard-negative mining docstring to use “inputs”/“documents” terminology. |
| sentence_transformers/sparse_encoder/losses/sparse_triplet.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/sparse_encoder/losses/sparse_multiple_negatives_ranking.py | Makes docstring wording modality-neutral (“embeddings”, “Inputs”). |
| sentence_transformers/sparse_encoder/losses/sparse_mse.py | Updates distillation wording/table from “sentence” → “input” and “sentence embeddings” → “embeddings”. |
| sentence_transformers/sparse_encoder/losses/sparse_margin_mse.py | Updates “passage” → “document” and “Texts” → “Inputs” in docstring. |
| sentence_transformers/sparse_encoder/losses/sparse_distill_kl_div.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/sparse_encoder/losses/sparse_cosine_similarity.py | Updates docstring examples to use input_A/input_B + “Inputs” table header. |
| sentence_transformers/sparse_encoder/losses/sparse_cosent.py | Updates docstring to “Input pairs” + “Inputs” table header. |
| sentence_transformers/sparse_encoder/losses/sparse_angle.py | Updates docstring to “Input pairs” + “Inputs” table header. |
| sentence_transformers/sparse_encoder/losses/csr.py | Updates docstrings to refer to “embeddings”/“inputs” instead of “sentence embeddings”/“sentences”. |
| sentence_transformers/sparse_encoder/losses/cached_splade.py | Updates helper docstrings from “sentences” → “inputs”. |
| sentence_transformers/sentence_transformer/losses/triplet.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/sentence_transformer/losses/softmax.py | Updates requirements + table wording from sentence pairs → input pairs. |
| sentence_transformers/sentence_transformer/losses/online_contrastive.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/sentence_transformer/losses/multiple_negatives_symmetric_ranking.py | Updates docstring to “embeddings” + “Inputs” table header. |
| sentence_transformers/sentence_transformer/losses/multiple_negatives_ranking.py | Updates docstring to “embeddings” + “Inputs” table header. |
| sentence_transformers/sentence_transformer/losses/mse.py | Updates distillation wording/table from “sentence embedding(s)” → “embedding(s)” and “Texts” → “Inputs”. |
| sentence_transformers/sentence_transformer/losses/mega_batch_margin.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/sentence_transformer/losses/matryoshka.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/sentence_transformer/losses/matryoshka_2d.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/sentence_transformer/losses/margin_mse.py | Updates “passage” → “document”, “Texts” → “Inputs”, and “sentences” → “inputs” in docstring. |
| sentence_transformers/sentence_transformer/losses/global_orthogonal_regularization.py | Updates docstring tables and examples (“passages” → “documents”, “Texts” → “Inputs”). |
| sentence_transformers/sentence_transformer/losses/gist_embed.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/sentence_transformer/losses/embed_distill.py | Updates distillation table wording from “sentence” → “input”. |
| sentence_transformers/sentence_transformer/losses/distill_kl_div.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/sentence_transformer/losses/denoising_auto_encoder.py | Updates TSDAE loss docstring wording from sentences → inputs + updates input table examples. |
| sentence_transformers/sentence_transformer/losses/cosine_similarity.py | Updates docstring examples to use input_A/input_B + “Inputs” table header. |
| sentence_transformers/sentence_transformer/losses/cosent.py | Updates docstring to “Input pairs” + “Inputs” table header. |
| sentence_transformers/sentence_transformer/losses/contrastive.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/sentence_transformer/losses/contrastive_tension.py | Updates docstrings/examples/tables from sentences → inputs and “sentence embeddings” → “embeddings”. |
| sentence_transformers/sentence_transformer/losses/cached_multiple_negatives_symmetric_ranking.py | Updates docstring to “embeddings” + “Inputs” table header. |
| sentence_transformers/sentence_transformer/losses/cached_multiple_negatives_ranking.py | Updates docstring to “embeddings” + “Inputs” table header and helper docstring (“sentences” → “inputs”). |
| sentence_transformers/sentence_transformer/losses/cached_gist_embed.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/sentence_transformer/losses/batch_semi_hard_triplet.py | Updates docstring wording/tables from sentences → inputs. |
| sentence_transformers/sentence_transformer/losses/batch_hard_triplet.py | Updates docstring wording/tables from sentences → inputs. |
| sentence_transformers/sentence_transformer/losses/batch_hard_soft_margin_triplet.py | Updates docstring wording/tables from sentences → inputs. |
| sentence_transformers/sentence_transformer/losses/batch_all_triplet.py | Updates docstring wording/tables from sentences → inputs. |
| sentence_transformers/sentence_transformer/losses/angle.py | Updates docstring to “Input pairs” + “Inputs” table header. |
| sentence_transformers/sentence_transformer/losses/adaptive_layer.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/cross_encoder/losses/rank_net.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/cross_encoder/losses/plist_mle.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/cross_encoder/losses/multiple_negatives_ranking.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/cross_encoder/losses/mse.py | Updates wording from query-passage → query-document and “Texts” → “Inputs”. |
| sentence_transformers/cross_encoder/losses/margin_mse.py | Updates wording from passages → documents and “Texts” → “Inputs”. |
| sentence_transformers/cross_encoder/losses/list_net.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/cross_encoder/losses/list_mle.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/cross_encoder/losses/lambda_loss.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/cross_encoder/losses/cross_entropy.py | Updates wording from sentence pairs → input pairs and “Texts” → “Inputs”. |
| sentence_transformers/cross_encoder/losses/cached_multiple_negatives_ranking.py | Renames docstring table column “Texts” → “Inputs”. |
| sentence_transformers/cross_encoder/losses/binary_cross_entropy.py | Updates table examples from sentence_A/sentence_B → input_A/input_B and “Texts” → “Inputs”. |
| sentence_transformers/cross_encoder/losses/adr_mse.py | Renames docstring table column “Texts” → “Inputs”. |
| docs/sparse_encoder/loss_overview.md | Updates sparse loss overview tables/prose to modality-neutral terminology. |
| docs/sentence_transformer/loss_overview.md | Updates sentence-transformer loss overview tables/prose to modality-neutral terminology. |
| docs/cross_encoder/loss_overview.md | Updates cross-encoder loss overview tables/prose (incl. mine_hard_negatives references) to modality-neutral terminology. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b3a4b22 to
5127605
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hello!
Pull Request overview
sentence/sentences->input/inputs,passage/passages->document/documentsTextscolumn toInputsin the loss input-format tables (both theloss_overview.mdpages and the docstringInputs:grid tables)Details
Now that
SentenceTransformer,CrossEncoder, andSparseEncodermodels are now multimodal, a lot of the documentation framing reads as too text-only. Tables that describe the accepted training data formats call the input columnTextsand spell out formats like(sentence_A, sentence_B) pairsorsingle sentences, and a fair amount of prose talks about "sentences" and "passages" where it really just means "inputs" and "documents".This PR sweeps that terminology to be modality-neutral:
docs/sentence_transformer/loss_overview.md,docs/cross_encoder/loss_overview.md,docs/sparse_encoder/loss_overview.md:Texts->Inputsheaders,(sentence_A, sentence_B) pairs->(input_A, input_B) pairs,single sentences->single inputs,(damaged_sentence, original_sentence) pairs->(damaged_input, original_input) pairs,(query, passage_one, passage_two)->(query, document_one, document_two),teacher/model sentence embeddings->teacher/model embeddings, plus the matching prose.All loss docstrings under
sentence_transformers/{sentence_transformer,cross_encoder,sparse_encoder}/losses/: theInputs:grid tables and theRequirements:/description prose, with the RST grid borders re-rendered to fit.sentence_transformers/util/hard_negatives.py: themine_hard_negativesdocstring, so its(anchor, passage, label)/(anchor, [passages], …)output-format descriptions line up with the rest.Tom Aarsen