Skip to content

[docs] Use modality-neutral terms (input, document) in loss docs & docstrings#3772

Merged
tomaarsen merged 3 commits into
huggingface:mainfrom
tomaarsen:worktree-loss-overview-no-sentences
May 12, 2026
Merged

[docs] Use modality-neutral terms (input, document) in loss docs & docstrings#3772
tomaarsen merged 3 commits into
huggingface:mainfrom
tomaarsen:worktree-loss-overview-no-sentences

Conversation

@tomaarsen
Copy link
Copy Markdown
Member

Hello!

Pull Request overview

  • Replace text-centric wording in the loss overview docs and loss docstrings: sentence/sentences -> input/inputs, passage/passages -> document/documents
  • Rename the Texts column to Inputs in the loss input-format tables (both the loss_overview.md pages and the docstring Inputs: grid tables)

Details

Now that SentenceTransformer, CrossEncoder, and SparseEncoder models are now multimodal, a lot of the documentation framing reads as too text-only. Tables that describe the accepted training data formats call the input column Texts and spell out formats like (sentence_A, sentence_B) pairs or single sentences, and a fair amount of prose talks about "sentences" and "passages" where it really just means "inputs" and "documents".

This PR sweeps that terminology to be modality-neutral:

  • docs/sentence_transformer/loss_overview.md, docs/cross_encoder/loss_overview.md, docs/sparse_encoder/loss_overview.md: Texts -> Inputs headers, (sentence_A, sentence_B) pairs -> (input_A, input_B) pairs, single sentences -> single inputs, (damaged_sentence, original_sentence) pairs -> (damaged_input, original_input) pairs, (query, passage_one, passage_two) -> (query, document_one, document_two), teacher/model sentence embeddings -> teacher/model embeddings, plus the matching prose.

  • All loss docstrings under sentence_transformers/{sentence_transformer,cross_encoder,sparse_encoder}/losses/: the Inputs: grid tables and the Requirements:/description prose, with the RST grid borders re-rendered to fit.

  • sentence_transformers/util/hard_negatives.py: the mine_hard_negatives docstring, so its (anchor, passage, label) / (anchor, [passages], …) output-format descriptions line up with the rest.

  • Tom Aarsen

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates documentation across SentenceTransformer, CrossEncoder, and SparseEncoder loss docs/docstrings to use modality-neutral terminology now that models can be multimodal.

Changes:

  • Replaces text-centric terms (e.g., “sentence(s)”, “passage(s)”, “Texts”) with modality-neutral terms (“input(s)”, “document(s)”, “Inputs”) in loss overviews and loss docstrings.
  • Updates example input-format tables (Markdown + RST grid tables) to reflect the new terminology.
  • Aligns mine_hard_negatives docstring output-format terminology with “document”/“inputs”.

Reviewed changes

Copilot reviewed 54 out of 54 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sentence_transformers/util/hard_negatives.py Updates hard-negative mining docstring to use “inputs”/“documents” terminology.
sentence_transformers/sparse_encoder/losses/sparse_triplet.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sparse_encoder/losses/sparse_multiple_negatives_ranking.py Makes docstring wording modality-neutral (“embeddings”, “Inputs”).
sentence_transformers/sparse_encoder/losses/sparse_mse.py Updates distillation wording/table from “sentence” → “input” and “sentence embeddings” → “embeddings”.
sentence_transformers/sparse_encoder/losses/sparse_margin_mse.py Updates “passage” → “document” and “Texts” → “Inputs” in docstring.
sentence_transformers/sparse_encoder/losses/sparse_distill_kl_div.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sparse_encoder/losses/sparse_cosine_similarity.py Updates docstring examples to use input_A/input_B + “Inputs” table header.
sentence_transformers/sparse_encoder/losses/sparse_cosent.py Updates docstring to “Input pairs” + “Inputs” table header.
sentence_transformers/sparse_encoder/losses/sparse_angle.py Updates docstring to “Input pairs” + “Inputs” table header.
sentence_transformers/sparse_encoder/losses/csr.py Updates docstrings to refer to “embeddings”/“inputs” instead of “sentence embeddings”/“sentences”.
sentence_transformers/sparse_encoder/losses/cached_splade.py Updates helper docstrings from “sentences” → “inputs”.
sentence_transformers/sentence_transformer/losses/triplet.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/softmax.py Updates requirements + table wording from sentence pairs → input pairs.
sentence_transformers/sentence_transformer/losses/online_contrastive.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/multiple_negatives_symmetric_ranking.py Updates docstring to “embeddings” + “Inputs” table header.
sentence_transformers/sentence_transformer/losses/multiple_negatives_ranking.py Updates docstring to “embeddings” + “Inputs” table header.
sentence_transformers/sentence_transformer/losses/mse.py Updates distillation wording/table from “sentence embedding(s)” → “embedding(s)” and “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/mega_batch_margin.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/matryoshka.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/matryoshka_2d.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/margin_mse.py Updates “passage” → “document”, “Texts” → “Inputs”, and “sentences” → “inputs” in docstring.
sentence_transformers/sentence_transformer/losses/global_orthogonal_regularization.py Updates docstring tables and examples (“passages” → “documents”, “Texts” → “Inputs”).
sentence_transformers/sentence_transformer/losses/gist_embed.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/embed_distill.py Updates distillation table wording from “sentence” → “input”.
sentence_transformers/sentence_transformer/losses/distill_kl_div.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/denoising_auto_encoder.py Updates TSDAE loss docstring wording from sentences → inputs + updates input table examples.
sentence_transformers/sentence_transformer/losses/cosine_similarity.py Updates docstring examples to use input_A/input_B + “Inputs” table header.
sentence_transformers/sentence_transformer/losses/cosent.py Updates docstring to “Input pairs” + “Inputs” table header.
sentence_transformers/sentence_transformer/losses/contrastive.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/contrastive_tension.py Updates docstrings/examples/tables from sentences → inputs and “sentence embeddings” → “embeddings”.
sentence_transformers/sentence_transformer/losses/cached_multiple_negatives_symmetric_ranking.py Updates docstring to “embeddings” + “Inputs” table header.
sentence_transformers/sentence_transformer/losses/cached_multiple_negatives_ranking.py Updates docstring to “embeddings” + “Inputs” table header and helper docstring (“sentences” → “inputs”).
sentence_transformers/sentence_transformer/losses/cached_gist_embed.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/sentence_transformer/losses/batch_semi_hard_triplet.py Updates docstring wording/tables from sentences → inputs.
sentence_transformers/sentence_transformer/losses/batch_hard_triplet.py Updates docstring wording/tables from sentences → inputs.
sentence_transformers/sentence_transformer/losses/batch_hard_soft_margin_triplet.py Updates docstring wording/tables from sentences → inputs.
sentence_transformers/sentence_transformer/losses/batch_all_triplet.py Updates docstring wording/tables from sentences → inputs.
sentence_transformers/sentence_transformer/losses/angle.py Updates docstring to “Input pairs” + “Inputs” table header.
sentence_transformers/sentence_transformer/losses/adaptive_layer.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/rank_net.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/plist_mle.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/multiple_negatives_ranking.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/mse.py Updates wording from query-passage → query-document and “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/margin_mse.py Updates wording from passages → documents and “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/list_net.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/list_mle.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/lambda_loss.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/cross_entropy.py Updates wording from sentence pairs → input pairs and “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/cached_multiple_negatives_ranking.py Renames docstring table column “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/binary_cross_entropy.py Updates table examples from sentence_A/sentence_B → input_A/input_B and “Texts” → “Inputs”.
sentence_transformers/cross_encoder/losses/adr_mse.py Renames docstring table column “Texts” → “Inputs”.
docs/sparse_encoder/loss_overview.md Updates sparse loss overview tables/prose to modality-neutral terminology.
docs/sentence_transformer/loss_overview.md Updates sentence-transformer loss overview tables/prose to modality-neutral terminology.
docs/cross_encoder/loss_overview.md Updates cross-encoder loss overview tables/prose (incl. mine_hard_negatives references) to modality-neutral terminology.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sentence_transformers/util/hard_negatives.py
Comment thread sentence_transformers/sentence_transformer/losses/denoising_auto_encoder.py Outdated
Comment thread sentence_transformers/sentence_transformer/losses/cosine_similarity.py Outdated
Comment thread sentence_transformers/sparse_encoder/losses/sparse_cosine_similarity.py Outdated
Comment thread sentence_transformers/sentence_transformer/losses/margin_mse.py
Comment thread sentence_transformers/sparse_encoder/losses/sparse_margin_mse.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 54 out of 54 changed files in this pull request and generated 1 comment.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 54 out of 54 changed files in this pull request and generated 5 comments.

Comment thread sentence_transformers/sentence_transformer/losses/angle.py Outdated
Comment thread sentence_transformers/sentence_transformer/losses/cosent.py Outdated
Comment thread sentence_transformers/sparse_encoder/losses/sparse_cosent.py
Comment thread sentence_transformers/sparse_encoder/losses/sparse_angle.py
@tomaarsen tomaarsen force-pushed the worktree-loss-overview-no-sentences branch from b3a4b22 to 5127605 Compare May 12, 2026 09:55
@tomaarsen tomaarsen merged commit d8ee041 into huggingface:main May 12, 2026
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants