Skip to content

Explain how to implement select "best match" with custom functions #271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mihnita opened this issue May 18, 2022 · 21 comments
Closed

Explain how to implement select "best match" with custom functions #271

mihnita opened this issue May 18, 2022 · 21 comments
Labels
blocker-candidate The submitter thinks this might be a block for the next release resolve-candidate This issue appears to have been answered or resolved, and may be closed soon.

Comments

@mihnita
Copy link
Collaborator

mihnita commented May 18, 2022

One of the contention points between the 3 proposals was the selection process: best match vs first match.

Implementing MessageFormat v1 behavior compatibility for plural requires best match functionality
(so that something like {foo, plural, one {...} =1 {...} other {...}} matches =1 if foo is 1)

Eemeli claimed (more than once) that this can be implemented by the selection function, without involvement from the MF2 spec.

This is true for one selector, but I don't see how it can be done for multiple selectors.

This is very typical when several dimensions are compared.
Imagine a list with 10 algorithms.
It is easy to select the fastest one.
It is easy to select the one that requires the least amount of data / memory.
It is easy to select the one with fewest lines of code (proxy for complexity).
But it is not clear how to select the one with the best combination.

Here is a MF2 example:

{$itemCount :plural} {$itemGender :gender}
     =1   fem  { ... the message 1 F ...}
     =1    _   { ... the message 1 O ...}
    one   masc { ... the message One M ...}
     _    masc { ... the message O M ...}
     _     _   { ... the message O O ...}

Let the arguments to format be { 'itemCount' : 1, 'itemGender' : male }

Best match for the :plural function are:

     =1   fem  { ... the message 1 F ...}
     =1    _   { ... the message 1 O ...}

Best match for the :gender function are:

    one   masc { ... the message One M ...}
     _    masc { ... the message O M ...}

Selections that match both functions are:

     =1    _   { ... the message 1 O ...}
    one   masc { ... the message One M ...}
     _    masc { ... the message O M ...}
     _     _   { ... the message O O ...}

But there is no combinations of best :plural and best :gender (which would be =1 masc {...})

Who can select the best combination (arguably one masc)?

The functions can't do it, because they have no visibility.
They can only see their own "column" of keys ( :plural => [ =1, =1, one, _, _ ] and :gender => [ fem, _, masc, masc, _ ])
Even with more visibility, they are not "equipped" to judge the matching the other functions (columns).

And the MF2 proper (the "core" implementation that only invokes the functions) cannot do it, because the EZ and Stas proposals argued that it is not needed, and can be done with custom functions. So the TC made that decision.

NOTE: I am not trying to reverse the TC decision, which said to do "first match", listening to the 2 proposals mentioned.

I only invite the proponents of the "first match" algorithm to explain how to implement this kind of best match with these constraints.
Because they claimed it is possible to do by delegating the work to the functions.

@mihnita mihnita added the blocker-candidate The submitter thinks this might be a block for the next release label May 18, 2022
@eemeli
Copy link
Collaborator

eemeli commented May 18, 2022

Implementing MessageFormat v1 behavior compatibility for plural requires best match functionality
(so that something like {foo, plural, one {...} =1 {...} other {...}} matches =1 if foo is 1)

Eemeli claimed (more than once) that this can be implemented by the selection function, without involvement from the MF2 spec.

To specify, the claim that I have made is that the matching algorithm used by MF1 can be fully supported in MF2. This may be done by controlling the order of variants in the data model representation of an MF1 message using e.g. this sort method.

Who can select the best combination (arguably one masc)?

Within a first-match selection framework, the list of variants is iterated through starting from the top, until a match succeeds. This means that the "who" in this case is the human that defines the order of those variants, and in this case has put =1 _ before one masc, thereby defining it to be the better match.

NOTE: I am not trying to reverse the TC decision, which said to do "first match", listening to the 2 proposals mentioned.

If this is the case, could you clarify why this should be considered a blocker-candidate? What is the issue that needs to be resolved here for the technical preview?

@mihnita
Copy link
Collaborator Author

mihnita commented May 18, 2022

"who" in this case is the human that defines the order of those variants

This means it does not support the MF 1 matching algorithm.
With MF 1 the order is not for the human to decide, it is really "best match".
One can list [other, one, =1] and =1 will be selected.

If this is the case, could you clarify why this should be considered a blocker-candidate?
What is the issue that needs to be resolved here for the technical preview?

I've explained the blocker-candidate in my email to the group.
It means "an issue that the one who opened thinks is a blocker".
But it is up to the WG to decide if it really is (and convert it to blocker),
or it is not (and remove the label)

@aphillips
Copy link
Member

aphillips commented May 18, 2022

MFv1 selection is "best match", but also is necessarily nested: you can only ever evaluate one item at a time. MFv2 is using a matrix and so might work differently.

Looking at your example, @mihnita, if each selector is "greedy", then the evaluation of (restating in a different order for clarity) this:

{$itemCount :plural} {$itemGender :gender}
    one   masc { ... the message One M ...}
     _    masc { ... the message O M ...}
     _     _   { ... the message O O ...}
     =1   fem  { ... the message 1 F ...}
     =1    _   { ... the message 1 O ...}

is =1 _ because the first selector finds a set of winners before the second one fires. If I wrote this nested (we do this today in Amazon), it would be clear that the selection is nested (and that some options are potentially "missing"):

{$itemCount :plural}
    =1  {{$itemGender :gender}
         fem {... 1 F...}
         _   {... O M...}}
    one {{$itemGender :gender}
        masc  {... one M... }
        _     {... one O...}}
    _ {{$itemGender :gender}
        masc {... O M...}
        _    {...O O...}}
}

If we rely on the order it could mean that some messages are potentially unreachable.

I don't understand @eemeli the part where you say until a match succeeds, since necessarily one needs to get more than one candidate for the second selector to consider. I think you mean to say that it gets all equivalent-quality matches?

Also, in the case of plural, the "quality" of the match is predetermined by the selector function (=1 matches the value 1 better than one does, but in the en locale one matches better than other). Since we need to consider more than one matching item anyway, the order doesn't come into it, except perhaps at the end of the process (if we ended up with more than one "equivalent quality" match)

@eemeli
Copy link
Collaborator

eemeli commented May 18, 2022

I don't understand @eemeli the part where you say until a match succeeds, since necessarily one needs to get more than one candidate for the second selector to consider. I think you mean to say that it gets all equivalent-quality matches?

With a first-match algorithm, there is no concept of "quality" within the algorithm. There are more detailed explanation of this in the earlier ez-spec and stasm proposals.

Taking for instance your example:

{$itemCount :plural} {$itemGender :gender}
    one   masc { ... the message One M ...}
     _    masc { ... the message O M ...}
     _     _   { ... the message O O ...}
     =1   fem  { ... the message 1 F ...}
     =1    _   { ... the message 1 O ...}

Here, the =1 fem and =1 _ variants would never be selected, because the "quality" of the matches is solely expressed in the order of variants, and both of those are below _ _, which would always be selected when encountered.

So the order of tests done for this case is:

  1. Test one against $itemCount :plural
    1. On success, test masc against $itemGender :gender
    2. On success, select the first variant -> END.
  2. Test _ against $itemCount :plural, and always succeed.
    1. Test masc against $itemGender :gender
    2. On success, select the second variant -> END.
  3. Test _ against $itemCount :plural, and always succeed.
    1. Test _ against $itemGender :gender, and always succeed.
    2. Select the third variant -> END.

@aphillips
Copy link
Member

Thanks @eemeli for the explanation, which makes sense.

I don't necessarily agree with it, since it requires the developer to arrange the matrix vs. letting the algorithm do it. My discussion of "quality of match" is probably misleading, since what I'm really doing is letting the algorithm order the if tests for the user. In the case of the value 1 and the plural function, the value can match all of =1, one, and *, but it's perverse to jump to * or even to one before checking the numeric value. The gender/select case is easier since it's either matched or using the * entry (the order of fem/masc doesn't matter so long as you check the explicit value(s) before *).

An argument can be made that letting the algorithm reorder the matrix will lead to subtle bugs due to bad developer behavior. Consider the difference between ordered and unordered matrices:

{$itemCount :plural} {$savedCount :plural}
[ =0   =0  ] { 0 0 }
[ =0   *   ] { 0 * }
[ one =0   ] { 1 0 }
[ one one  ] { 1 1 }
[ one *    ] { 1 * }
[ *   =0   ] { * 0 }
[ *   one  ] { * 1 }
[ *   *    ] { * * }

# vs.

[ one =0   ] { 1 0 }
[ =0   *   ] { 0 * }
[ *   =0   ] { * 0 }
[ one one  ] { 1 1 }
[ one *    ] { 1 * }
[ =0   =0  ] { 0 0 }
[ *   *    ] { * * }
[ *   one  ] { * 1 }

In "my" world, both of these evaluate the same, but I defy anyone to check completeness of the bottom one at a glance :-). OTOH, I can just add the case [ =5 =7 ] to the end of the matrix and it'll just work.......

@mihnita
Copy link
Collaborator Author

mihnita commented May 20, 2022

+100 to Addison's comment.
Manual reordering is a non-starter.

You don't want developers to order things.
And you don't want translators to reorder things either.

Also, if we require human ordering, we can't claim that MF2 can reproduce the MF1 behavior, which ignores the ordering.


No matter what, even if we decided to use the exact MF1 selection (which is subpar, and can be improved), there are some implications for the "engine" (whatever implements the syntax + eval, in this case ICU, for a September Tech Preview)

Implications:

  1. the spec should say that all combinations MUST have a * component (the way MF1 does)
    That would mean my initial example is invalid, because there is no one * combination

  2. The spec should say that the algorithm for multiple keys is "lexicographic" (match first key, then second, and so on)
    So it is an algorithm, it is engine level (can't be delegated to a selection function level) and it is decided in the standard.
    Might not give the "best" results, but it is specified.

This is what this bug is about: specify that algorithm. Best or not, but an algorithm, not a human reordering things.

Or human reordering, if that is what WG members vote (and / or CLDR/ICU TC?).

@zbraniecki
Copy link
Member

How strong we want the "MF1 compatibility" argument to inform our decision here?

Separately - would it help if we could allow for plural selection strategy to be defined in a message meta to allow for MF2 to use the optimal strategy and then separately have @selectionStrategy: "mf1-compat" meta to trigger MF1 compat one?

@aphillips
Copy link
Member

How strong we want the "MF1 compatibility" argument to inform our decision here?

The syntax obviously is completely incompatible. Maybe "operational familiarity" is important, though? When we break with MFv1, we should do it consciously because it adds value.

@eemeli
Copy link
Collaborator

eemeli commented May 20, 2022

I would consider a minimum level of compatibility being reached by being able to parse MF1 syntax into the MF2 data model, and to be able to then have it be formatted the same way as in MF1. This is currently possible, with two caveats:

  1. Due to MF2 selection being top-level-only and following a first-match method, the MF1 syntax -> MF2 data model parser needs to lift up all selectors to the top level and sort the variants accordingly.
  2. The formatting and selection functions required by MF1 may differ from whatever an MF2 implementation considers its default set of functions.

When leaving out message references from the spec, we lost full two-way compatibility with MF1, i.e. it's not possible to transform a message MF1 syntax -> MF2 data model -> MF1 syntax and be sure to get the same structural representation back, though a message thus transformed will still format the same.

@mihnita
Copy link
Collaborator Author

mihnita commented May 23, 2022

Then maybe we should first decide what we mean by compatibility.

I doubt anyone wants to move back and forth between the MF1 and MF2 syntax.
The biggest benefits of MF2 are in what it does, not in what it looks like.

So the way I think of compatibility is that MF2 can do the (good) things that MF1 is able to.
It's OK to drop "bad features" (and we did that already when we decided that selectors can only be top level).
It does not mean work 100% the same (or better) out of the box, but at least make it work the same (with custom function etc.)

I suppose that you expect the same thing when we are talking about Fluent.
Do you / Mozilla have any interest in a MF2 that can't do what Fluent does today?
And there is no way to make it do the same things?

@romulocintra
Copy link
Collaborator

Consensus : It's not a blocker for implementing a formatting library

@romulocintra romulocintra removed the blocker-candidate The submitter thinks this might be a block for the next release label May 23, 2022
@markusicu
Copy link
Member

My memory from the March discussions with Mark & the TCs is that the formatting library should do first-match, and that other tooling (linters, localization tools) should complain about bad ordering, or should reorder. I think it's a good idea to say in the spec what's a good or bad order, but I don't think that this is a blocker for implementing a formatting library.

@mihnita mihnita added the blocker-candidate The submitter thinks this might be a block for the next release label Nov 3, 2022
@aphillips
Copy link
Member

aphillips commented Feb 14, 2023

I started writing a comparison document in my fork here (fixed):

https://github.com/aphillips/message-format-wg/blob/issue-351/exploration/selection-matching-options.md

It is not complete and I will probably create a new issue to track the choice between first and best match when it is ready (since this is off-topic to deciding the best match algorithm should we choose to adopt it).

@eemeli
Copy link
Collaborator

eemeli commented Feb 15, 2023

@aphillips It's outside the nominal scope of this issue and not yet covered by your comparison document, but you may want to look into prior art outside of MF1 to include in the discussion, such as the operation of switch and match statements in various programming languages, and the precedence of CSS selectors.

@aphillips
Copy link
Member

@eemeli Good point. I do cover this to some degree inside the discussion of MF1, but will expand on it.

@mihnita
Copy link
Collaborator Author

mihnita commented Mar 3, 2023

I think that taking inspiration from programming languages takes us in the wrong direction, as we are not trying to solve the same problem.

A programmer writing a switch statement has a pretty good idea that they want this option to be more important that that other option, so they put it before.

But for what we do the order is potentially unknowable when one writes the expression.

It might be locale dependent (for example a language might want a genitive form before an accusative one, and another language might not want that).

It might be runtime dependent, for example a long and a short form of a message.
The long form is best, as it is clear and does not use abbreviations.
A short form is less clear, might use abbreviations, but it is the only one possible on a small device (think watch).

So you can't sort something like this advance:

match {$deviceSize}
  when nano {+}
  when tiny {Hinz.}
  when long {Einen Alarm hinzufügen}
  when *    {Hinzufügen}

Or because you run on a different OS:

match {$os}
  when macos {...}
  when win  {...}
  when *    {...}

You also can't sort or lint things at build time using a function.
For example because the build system is written in a different language than the custom function.
Or because the custom function is not even available (the localization company will not have access to my custom functions).
Or because in order to compile the custom function one would have to drag in all kind of dependencies, or even full toolchains.
(dragging in a full rust toolchain or Android SDK just to lint the sort order)
Or you lint on Linux (a CI in the cloud), but you have 3 versions of the custom function, one for MacOS, so you can't even build it and run it on Linux.

TLDR: if I can sort / lint the order with a function at build time, then I can do that 10 times easier and with fewer complications at runtime. Making the whole think "best match"

@mihnita
Copy link
Collaborator Author

mihnita commented Mar 3, 2023

How strong we want the "MF1 compatibility" argument to inform our decision here?

In my mind it is pretty strong.

But we need to agree what we mean by that.

I think that is applies (and can help) to more than this issue (for example issue #350 "Allow names to start with a digit")

@aphillips
Copy link
Member

@mihnita Consider making comments on https://github.com/aphillips/message-format-wg/blob/issue-351/exploration/selection-matching-options.md as the discussion vehicle for this issue Monday.

@eemeli
Copy link
Collaborator

eemeli commented Mar 3, 2023

It might be runtime dependent, for example a long and a short form of a message. The long form is best, as it is clear and does not use abbreviations. A short form is less clear, might use abbreviations, but it is the only one possible on a small device (think watch).

So you can't sort something like this advance:

match {$deviceSize}
  when nano {+}
  when tiny {Hinz.}
  when long {Einen Alarm hinzufügen}
  when *    {Hinzufügen}

@mihnita Could you explain a bit more how the order of the variants in this example could affect the results? Wouldn't $deviceSize only ever match one of these, so the order between nano/tiny/long would not matter for either a best-match or first-match algorithm?

@eemeli eemeli added the resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. label Jul 3, 2023
@eemeli
Copy link
Collaborator

eemeli commented Jul 3, 2023

I think we've resolved this by selecting a variant selection method?

@aphillips
Copy link
Member

Closing resolve-candidates per discussion in 2023-07-24 call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker-candidate The submitter thinks this might be a block for the next release resolve-candidate This issue appears to have been answered or resolved, and may be closed soon.
Projects
None yet
Development

No branches or pull requests

6 participants