-
-
Notifications
You must be signed in to change notification settings - Fork 36
Support variable info not in message patterns #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This issue came up as a question from @mihnita during today's meeting. Feel free to correct anything that I didn't capture quite right. |
A common enough example of this in the current MessageFormat is the SelectFormat: an enumerated data value is used to select between messages (gender is a common example, but it can by anything really--which is its own problem). In our (Amazon's) proprietary format we force these to use "complete thought" strings (which can be nested, including interleaved with plural). Something like:
|
Yes, my example was for gender, but can be anything. To me this can be modeled best using a construct similar to the
For example:
I think this is instantly familiar to any programmer. If we adopt it because of this, then it is clear what goes where:
It is also easy to do algorithmically check that the missing cases are and add them in languages that need them (by copying from the [other...] case) Taking a language with no gender and no numbers as source (let's say Chinese) one can create this minimal message:
Because we know the the selection types (GENDER, PLURAL) it is easy to determine that the cases to add will be all the combinations of plural cases (language dependent) + gender cases (also language dependent). Using the parameter + value as keys ({MEDIUM=SPEECH, COUNT=MANY}) it means we can mix and match, and the selectors are not "consistent":
What are the missing cases is missing in the message above? In "traditional programming languages" this is a collection of
Most linters can check switch constructs and report missing cases (for enums), or missing default. |
In the example of the "switch" the array of conditions is in the opposite order of gender/count as the "array of values". Oversight? Intentional? |
I think the default-case handling that @mihnita mentions is a different discussion from the rest of this thread? As I understand it, what we're talking about here is being able to determine a message based not only on its own input parameters, but also on other values. From a developer's point of view, they have some process by which they acquire a function that returns a message: message({ name: 'Mikko' }) // 'Hello Mikko' With MF1, all such parameters need to be directly given to the function. But could we add a stage at which common parameters may be defined, that are also available as parameters? Using the example of @aphillips: const messages = getMessages({ name: 'Mikko' })
const msg = messages['someRandomMessageId']
msg({ messageType: 'email' }) // 'Hello Mikko, you have new emails in your inbox.' I think this would be a good idea. There are of course questions about scope that need to be addressed; does the identifier used in the message need to make it clear whether the variable is coming from the immediate parameters, or a wider scope? Can a message function be called with a parameter that masks a scope parameter? I also think that this provides a decent argument for the AST's root not to be a single message, but some form of resource object that can contain not only one or more messages, but also identifiers for expected scope parameters. |
In Fluent we had a concept of context data for quite a while, it was meant to work very similarly to what Eemeli is describing here: let bundle = new FluentBundle("en", {
ctxData: { name: "Mikko" }
});
bundle.addResource(res1);
bundle.addResource(res2);
let msg = bundle.getMessage("key1");
bundle.formatPattern(msg.value, {
type: "email"
}); // "Hello Mikko, you have new emails in your inbox. It never got traction at during one of the API remodels we removed it with an intention to add back if users ask for it. |
I like the idea of contextual parameters. In practice, your code wouldn't look like the above examples. The customer's name ("Mikko") would be in a context variable or injected. Otherwise you'd just pass it explicitly in the format call (to ensure it is present). Maybe one would need some sort of guardrail to ensure that all of the contextual parameters get loaded with something. To @eemeli's comment, I think it is hard to separate the resource format from the formatter. ICU's current message format produces an untranslateable mess when you use |
Sorry, oversight. |
The trouble with context / binding is scope (in the programming language), and in general the fact that there is no easy access to those variables. In Java you can use reflection, but it is clunky. In C/C++ is even worse, the variable names are lost. Many systems (Windows (Win32), Java, Android, macOS & iOS, Qt, others) store all strings in one single resource "bundle" And you can of course put in parameters everything that is useful for rendering the message, not only the visible part.
It is the developer's job to store in parameters everything needed (host_gender, host_name), visible or not. In fact, there is a benefit in that. If you do this:
you can refactor (rename) the If you have some "magic binding" or access to the variables of the programming language then this If there is some "magic environment bucket of variables" then you can have it in a Map<...> context; Or can have all kind of helper methods (have a Context class with createParams that gives you a map where you add some extras):
you put all the "global" stuff that you might use in messages in context, and then you do
TLDR: I can't see a mechanism that works across languages to access variables. From the Fluent example I don't think there is conflict, and it is very similar to what I described. But I don't think that changes in any way the data model. loadString => load the string + parses it into some kind of (immutable) data model |
I'm not sure if I agree. I can imagine a fairly complex UI (say, Facebook, Gmail, Firefox UI) that could have contextual information about user's gender and all l10n contexts could use that information to select the variant of any message to work with the information about user (name, gender, age, etc.) |
@zbraniecki Probably I didn't express it that well. We actually have a |
I think @mihnita is right here. This is going back a bit on my earlier comment, but from the point of view of a single message, how does a reference to a variable passed in directly differ from a reference to a reference to a context variable? Not necessarily at all. @echeran's original example was a selector choosing a case if |
I can provide some context about this. We removed it (back when Fluent was L20n) because the implementation we had was opinionated wrt. the reactivity to the mutations of the context data. It would set two-way bindings between the data and the callsites, and then re-translate the callsites when the context data was mutated. The way it worked meant it was challenging to integrate it into codebases which already had their system for managing variable bindings (e.g. MVC frameworks). As long as the context data is immutable, the examples tend to look great :) It's important to consider the entire lifecycle, however, in particular what changes when the data is mutated. I agree with @mihnita and @eemeli that this probably shouldn't impact the data model. My own preference would be to only allow variables passed directly to Another avenue is the one we went in modern Fluent, which supports an open list of selectors. Implementations can then define their custom selectors returning the context variables as needed. E.g. in Firefox, the |
A lot has been discussed here. So I'll add a few thoughts.
There are times that we want to change the words chosen given the state of a device. If you're looking away from a device, we may want to be more descriptive with the choice of words. If the screen is showing, we may want to be a little more terse. If the voice is muted, we may want to be verbose in the print form. If we have really small screen space, we may want to print a little text and speak a little more. These states are not a part of how we annotate the message. We will use selectors based on the device state to chose the response. I consider this to be a design choice that does not have to be a part of the framework. The application and the message author can chose what states are appropriate in a message. There is a question about localizability. If you allow complex conditions that involves AND, OR, NOT and parentheses, that can make it hard for translators to adapt. Some conditions need to be localized. So it's helpful to expose them. Though if you give too much flexibility, developers will put too much selector logic into the message requiring the same logic to be copied over and over into many languages. Some of that logic should have been left out of the message in the first place. So it's hard to find the right balance between flexibility and excessive complexity. |
@grhoten Thanks for that summary.
I think successful designs do not expose the translators to the selection logic. The selection logic is forced to be outside the messages (it may be communicated as context to the translator, but the translators don't have to interact with or manage it). There is the possibility that this produces a large number of nearly identical strings for translation.
Can you give a concrete example of this one? I think I understand, but I'm not sure that I do... |
I think it may be possible to provide at least a subset of this functionality without using programming syntax. Nesting like you see in the ICU MessageFormat example is like an "and". If you allow selection of multiple values at once for the same span of text, that can be an "or" operation. For example, it could be a comma separated list of possible values. I'm not sure how a "not" operation would be done. I guess a "not" would the default in a switch statement or the last span without conditions of a first span.
I think Fluent's web site has a similar example. The examples using -sync-brand-name is similar in concept. |
+100 to that :-) And maybe the "span" lingo is throwing me off. One step up, I think decision here is about:
It does not really matter how complex the condition is. In general option 1 (part of message) are really bad for i18n. |
The above discussion is rich with useful examples. I suspect that it can be closed because we have adopted a selection model that can consume any external values to do message selection and we don't specify whether the values are "contextual" or explicitly passed. A reason to keep this issue open might be if we need to define standardized contextual variables that all messages are guaranteed access to. But we might be better off with a specific issues about that rather than reusing this issue. |
Closing per 2023-06-19 telecon discussion. Foregoing comment still applies. |
Sometimes, a variable piece of information that affects the translation or formatting of a message pattern may not be naturally represented as a regular "printable" placeholder -- a placeholder that occupies a position within the message pattern. This information may be known at the time the message pattern is created, so we should represent it as part of the message somehow. I think that issue #33 about denoting whether a message is for print or for speech might be a good example of this.
I think we can store this type of information at the level of the message, but outside of the message pattern in which the "printable placeholders" occur. More specifically, I think we can re-use the concept of placeholders and treat these placeholders as "non-printing". Doing so could help during the selection phase of a multi-select message, esp if the message already has printing placeholders (ex: when
{MEDIUM=SPEECH, COUNT=MANY}
, the multi-select might return["Hey y'all, " {COUNT} " is a boatload."]
).This decision would help inform the shape of the data model.
More context on the text/speech problem is in issue #33 filed by @grhoten.
I think @mihnita may have had ideas of other examples.
The text was updated successfully, but these errors were encountered: