Interfaces in v2 #2739

KennethEnevoldsen · 2025-05-29T15:04:34Z

KennethEnevoldsen
May 29, 2025
Maintainer

Hei @Samoed, @isaac-chung I had a look at the Encoder interface in v2. These are one of the big things to get right.

There are a few things that I currently want to examine

It seems like cross encoders isn't really compatible (e.g. calling .encode on a wrapped Sentence TRF cross encoder will result in a call to the .encode method which the cross encoder does not have. I would suggest splitting them up into two interfaces and also splitting up the wrapper into two wrappers. WDYT?
We can no longer do naive tests:

E.g., before a simple test of embedding sizes would look like:

import mteb
meta = mteb.get_model_meta("BAAI/bge-m3")
model = meta.load_model()

emb = model.encode("test", task_name = "we have to put this here") # ideally task_name should be default None
emb.shape #(1024,)

I am not sure what the correct approach is here, but it seems like the current setup requires way too much setup to do a simple test. Any suggestions?

An option here is to define a utility function on the Protocol like:

def text_encode()

Which you can then inherit. I am not sure what the best option is here though.

To be clear here, I am quite happy with the .encode in its flexibility and the fact that it standardizes so much of the codebase. I just want to give an extra thought to usability.

Lack of documentation of what methods are required and when

e.g. I am unsure when combine_query_and_instruction is required (I suspect it is not).

isaac-chung · 2025-05-29T16:05:27Z

isaac-chung
May 29, 2025
Collaborator

I'm not sure I follow entirely, as I struggle to see the painpoint here. What if we have the following condition for cross encoders in model.encode?

def encode(self, ...):
    if isinstance(self.model, CrossEncoder),
        self.model.predict(...)

Would help a lot as well if you can share some links or code examples. I'm reading this file at the moment.

2 replies

Samoed May 29, 2025
Collaborator

It won't solve, because encode will recive only queries or passages, but not both

isaac-chung May 29, 2025
Collaborator

Ahh yes, good catch.

Samoed · 2025-05-29T16:15:15Z

Samoed
May 29, 2025
Collaborator

It seems that cross-encoders aren’t fully compatible with the current setup. For example, calling .encode on a wrapped Sentence-TRF cross-encoder ends up invoking a method that doesn’t actually exist on that model. I’d suggest separating the encoder types into two distinct interfaces and also splitting the wrapper into two dedicated classes. What do you think?

I agree. I think we should support cross-encoders only for reranking tasks. This could be included as part of #2728. I think encoders could be available only throught search interface

I'm not sure what the ideal solution is, but right now, the amount of setup required just to run a simple test feels excessive. Any suggestions?

I think it's even more complex. It should look more like this:

import mteb
meta = mteb.get_model_meta("BAAI/bge-m3")
model = meta.load_model()

emb = model.encode(
    DataLoader({"text": ["some text"]}),  # we now require DataLoader[BatchedInput], but I think this example still invalid
    task_name="we have to put this here"
)
emb.shape  # (1024,)

We might consider renaming encode to something like encode_batch or encode_dataloader, since the main goal of this interface is to standardize model APIs — and this current approach actually works well for our internal use. We could then redefine encode itself, but making it both simple and flexible is tricky. Should we support just plain strings? Or also accept BatchedInput like {"text": ["some text"]}? What exactly should be supported?

We could offer helper functions to simplify testing and user-facing use cases. But within the core implementation, sticking with a current encode (or its new version with all parameters explicitly required).

For example, I’m not sure when combine_query_and_instruction is supposed to be used (my guess is that it’s not always needed).

That method is used specifically during Instruction Retrieval / Reranking. It’s defined in AbsEncoder, and model implementations can override it as needed.

5 replies

KennethEnevoldsen May 31, 2025
Maintainer Author

I agree. I think we should support cross-encoders only for reranking tasks. This could be included as part of #2728. I think encoders could be available only throught search interface

I agree with this. Let us work towards that. So it is not a CrossEncoder API, but a Retriever (so we have a Retriever and a Encoder interface)

We might consider renaming encode to something like encode_batch or encode_dataloader, since the main goal of this interface is to standardize model APIs — and this current approach actually works well for our internal use.

Agree here, what we have now does a great job for developer ergonomics

We could then redefine encode itself, but making it both simple and flexible is tricky. Should we support just plain strings? Or also accept BatchedInput like {"text": ["some text"]}? What exactly should be supported?

Yeah, I think we will have a hard time making encode work for everything, without future additions from other libraries.

I think encode_batch is a good rename, and then adding encode_{modality} for utility only. We can also then add an encode which is equivalent to encode_text?

That method is used specifically during Instruction Retrieval / Reranking. It’s defined in AbsEncoder, and model implementations can override it as needed.

Thanks. It was more that I couldn't read it from the docstring so we should document it better. However I am not sure if it will actually be needed if we add the search interface (let us see, not the first problem at least).

Samoed May 31, 2025
Collaborator

I think encode_batch is a good rename, and then adding encode_{modality} for utility only. We can also then add an encode which is equivalent to encode_text?

I think we can add some modality-specific methods.

encode would take {"text": ["some text"], "image": ... } (BatchedInput)
encode_text ["some text"]
encode_image ...
and then, we could use the encode_batch method to use encode, and only the implementation of encode method would need to be implemented by the models.

KennethEnevoldsen May 31, 2025
Maintainer Author

Hmm right. I see, yeah, that might ease the implementation burden as well. Though I could also imagine that encode calls encode_batch, however, we would probably have to attempt an implementation to see.

isaac-chung May 31, 2025
Collaborator

Btw we had discussed this topic before in detail, but I'm not sure why the conclusion is changed again?
#1606

KennethEnevoldsen Jun 2, 2025
Maintainer Author

You are completely correct @isaac-chung, I know I kinda bring up an old skeleton here. I am generally happy with our previous decision from an implementation POV, but I just want to make sure that we ensure that it is also user friendly (which I think we can make it with a few changes).

Interfaces in v2 #2739

Uh oh!

KennethEnevoldsen May 29, 2025 Maintainer

Replies: 2 comments · 7 replies

Uh oh!

isaac-chung May 29, 2025 Collaborator

Uh oh!

Samoed May 29, 2025 Collaborator

Uh oh!

isaac-chung May 29, 2025 Collaborator

Uh oh!

Uh oh!

Samoed May 29, 2025 Collaborator

Uh oh!

KennethEnevoldsen May 31, 2025 Maintainer Author

Uh oh!

Uh oh!

Samoed May 31, 2025 Collaborator

Uh oh!

KennethEnevoldsen May 31, 2025 Maintainer Author

Uh oh!

Uh oh!

isaac-chung May 31, 2025 Collaborator

Uh oh!

KennethEnevoldsen Jun 2, 2025 Maintainer Author

KennethEnevoldsen
May 29, 2025
Maintainer

Replies: 2 comments 7 replies

isaac-chung
May 29, 2025
Collaborator

Samoed May 29, 2025
Collaborator

isaac-chung May 29, 2025
Collaborator

Samoed
May 29, 2025
Collaborator

KennethEnevoldsen May 31, 2025
Maintainer Author

Samoed May 31, 2025
Collaborator

KennethEnevoldsen May 31, 2025
Maintainer Author

isaac-chung May 31, 2025
Collaborator

KennethEnevoldsen Jun 2, 2025
Maintainer Author