[`docs`] Use modality-neutral terms (input, document) in loss docs & docstrings by tomaarsen · Pull Request #3772 · huggingface/sentence-transformers

tomaarsen · 2026-05-12T09:16:44Z

Hello!

Pull Request overview

Replace text-centric wording in the loss overview docs and loss docstrings: sentence/sentences -> input/inputs, passage/passages -> document/documents
Rename the Texts column to Inputs in the loss input-format tables (both the loss_overview.md pages and the docstring Inputs: grid tables)

Details

Now that SentenceTransformer, CrossEncoder, and SparseEncoder models are now multimodal, a lot of the documentation framing reads as too text-only. Tables that describe the accepted training data formats call the input column Texts and spell out formats like (sentence_A, sentence_B) pairs or single sentences, and a fair amount of prose talks about "sentences" and "passages" where it really just means "inputs" and "documents".

This PR sweeps that terminology to be modality-neutral:

docs/sentence_transformer/loss_overview.md, docs/cross_encoder/loss_overview.md, docs/sparse_encoder/loss_overview.md: Texts -> Inputs headers, (sentence_A, sentence_B) pairs -> (input_A, input_B) pairs, single sentences -> single inputs, (damaged_sentence, original_sentence) pairs -> (damaged_input, original_input) pairs, (query, passage_one, passage_two) -> (query, document_one, document_two), teacher/model sentence embeddings -> teacher/model embeddings, plus the matching prose.
All loss docstrings under sentence_transformers/{sentence_transformer,cross_encoder,sparse_encoder}/losses/: the Inputs: grid tables and the Requirements:/description prose, with the RST grid borders re-rendered to fit.
sentence_transformers/util/hard_negatives.py: the mine_hard_negatives docstring, so its (anchor, passage, label) / (anchor, [passages], …) output-format descriptions line up with the rest.
Tom Aarsen

…docstrings

Copilot

Pull request overview

Updates documentation across SentenceTransformer, CrossEncoder, and SparseEncoder loss docs/docstrings to use modality-neutral terminology now that models can be multimodal.

Changes:

Replaces text-centric terms (e.g., “sentence(s)”, “passage(s)”, “Texts”) with modality-neutral terms (“input(s)”, “document(s)”, “Inputs”) in loss overviews and loss docstrings.
Updates example input-format tables (Markdown + RST grid tables) to reflect the new terminology.
Aligns mine_hard_negatives docstring output-format terminology with “document”/“inputs”.

Reviewed changes

Copilot reviewed 54 out of 54 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
sentence_transformers/util/hard_negatives.py	Updates hard-negative mining docstring to use “inputs”/“documents” terminology.
sentence_transformers/sparse_encoder/losses/sparse_triplet.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sparse_encoder/losses/sparse_multiple_negatives_ranking.py	Makes docstring wording modality-neutral (“embeddings”, “Inputs”).
sentence_transformers/sparse_encoder/losses/sparse_mse.py	Updates distillation wording/table from “sentence” → “input” and “sentence embeddings” → “embeddings”.
sentence_transformers/sparse_encoder/losses/sparse_margin_mse.py	Updates “passage” → “document” and “Texts” → “Inputs” in docstring.
sentence_transformers/sparse_encoder/losses/sparse_distill_kl_div.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sparse_encoder/losses/sparse_cosine_similarity.py	Updates docstring examples to use input_A/input_B + “Inputs” table header.
sentence_transformers/sparse_encoder/losses/sparse_cosent.py	Updates docstring to “Input pairs” + “Inputs” table header.
sentence_transformers/sparse_encoder/losses/sparse_angle.py	Updates docstring to “Input pairs” + “Inputs” table header.
sentence_transformers/sparse_encoder/losses/csr.py	Updates docstrings to refer to “embeddings”/“inputs” instead of “sentence embeddings”/“sentences”.
sentence_transformers/sparse_encoder/losses/cached_splade.py	Updates helper docstrings from “sentences” → “inputs”.
sentence_transformers/sentence_transformer/losses/triplet.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/softmax.py	Updates requirements + table wording from sentence pairs → input pairs.
sentence_transformers/sentence_transformer/losses/online_contrastive.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/multiple_negatives_symmetric_ranking.py	Updates docstring to “embeddings” + “Inputs” table header.
sentence_transformers/sentence_transformer/losses/multiple_negatives_ranking.py	Updates docstring to “embeddings” + “Inputs” table header.
sentence_transformers/sentence_transformer/losses/mse.py	Updates distillation wording/table from “sentence embedding(s)” → “embedding(s)” and “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/mega_batch_margin.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/matryoshka.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/matryoshka_2d.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/margin_mse.py	Updates “passage” → “document”, “Texts” → “Inputs”, and “sentences” → “inputs” in docstring.
sentence_transformers/sentence_transformer/losses/global_orthogonal_regularization.py	Updates docstring tables and examples (“passages” → “documents”, “Texts” → “Inputs”).
sentence_transformers/sentence_transformer/losses/gist_embed.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/embed_distill.py	Updates distillation table wording from “sentence” → “input”.
sentence_transformers/sentence_transformer/losses/distill_kl_div.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/denoising_auto_encoder.py	Updates TSDAE loss docstring wording from sentences → inputs + updates input table examples.
sentence_transformers/sentence_transformer/losses/cosine_similarity.py	Updates docstring examples to use input_A/input_B + “Inputs” table header.
sentence_transformers/sentence_transformer/losses/cosent.py	Updates docstring to “Input pairs” + “Inputs” table header.
sentence_transformers/sentence_transformer/losses/contrastive.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/contrastive_tension.py	Updates docstrings/examples/tables from sentences → inputs and “sentence embeddings” → “embeddings”.
sentence_transformers/sentence_transformer/losses/cached_multiple_negatives_symmetric_ranking.py	Updates docstring to “embeddings” + “Inputs” table header.
sentence_transformers/sentence_transformer/losses/cached_multiple_negatives_ranking.py	Updates docstring to “embeddings” + “Inputs” table header and helper docstring (“sentences” → “inputs”).
sentence_transformers/sentence_transformer/losses/cached_gist_embed.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/batch_semi_hard_triplet.py	Updates docstring wording/tables from sentences → inputs.
sentence_transformers/sentence_transformer/losses/batch_hard_triplet.py	Updates docstring wording/tables from sentences → inputs.
sentence_transformers/sentence_transformer/losses/batch_hard_soft_margin_triplet.py	Updates docstring wording/tables from sentences → inputs.
sentence_transformers/sentence_transformer/losses/batch_all_triplet.py	Updates docstring wording/tables from sentences → inputs.
sentence_transformers/sentence_transformer/losses/angle.py	Updates docstring to “Input pairs” + “Inputs” table header.
sentence_transformers/sentence_transformer/losses/adaptive_layer.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/rank_net.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/plist_mle.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/multiple_negatives_ranking.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/mse.py	Updates wording from query-passage → query-document and “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/margin_mse.py	Updates wording from passages → documents and “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/list_net.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/list_mle.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/lambda_loss.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/cross_entropy.py	Updates wording from sentence pairs → input pairs and “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/cached_multiple_negatives_ranking.py	Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/binary_cross_entropy.py	Updates table examples from sentence_A/sentence_B → input_A/input_B and “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/adr_mse.py	Renames docstring table column “Texts” → “Inputs”.
docs/sparse_encoder/loss_overview.md	Updates sparse loss overview tables/prose to modality-neutral terminology.
docs/sentence_transformer/loss_overview.md	Updates sentence-transformer loss overview tables/prose to modality-neutral terminology.
docs/cross_encoder/loss_overview.md	Updates cross-encoder loss overview tables/prose (incl. `mine_hard_negatives` references) to modality-neutral terminology.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 54 out of 54 changed files in this pull request and generated 1 comment.

Copilot

Pull request overview

Copilot reviewed 54 out of 54 changed files in this pull request and generated 5 comments.

tomaarsen added 2 commits May 12, 2026 11:13

[docs] Use modality-neutral terms (input, document) in loss docs & …

fc2cc7a

…docstrings

Fix table markdown formatting

c70f429

tomaarsen requested a review from Copilot May 12, 2026 09:16

Copilot started reviewing on behalf of tomaarsen May 12, 2026 09:17 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

Apply suggestions, remove InputExample in docstrings

5127605

tomaarsen requested a review from Copilot May 12, 2026 09:39

Copilot started reviewing on behalf of tomaarsen May 12, 2026 09:40 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

Comment thread sentence_transformers/sparse_encoder/losses/sparse_cosine_similarity.py

tomaarsen requested a review from Copilot May 12, 2026 09:49

Copilot started reviewing on behalf of tomaarsen May 12, 2026 09:50 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

tomaarsen force-pushed the worktree-loss-overview-no-sentences branch from b3a4b22 to 5127605 Compare May 12, 2026 09:55

tomaarsen merged commit d8ee041 into huggingface:main May 12, 2026
40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`docs`] Use modality-neutral terms (input, document) in loss docs & docstrings#3772

[`docs`] Use modality-neutral terms (input, document) in loss docs & docstrings#3772
tomaarsen merged 3 commits into
huggingface:mainfrom
tomaarsen:worktree-loss-overview-no-sentences

tomaarsen commented May 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tomaarsen commented May 12, 2026

Pull Request overview

Details

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants