- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 35
Add first draft of default attribute definitions #1098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start. Lots of nit-picky comments.
Maybe a good question is: should these be directly incorporated? Or should all of these XLIFFy things be namespaced? Some of what XLIFF does doesn't apply to UMF messages and some of it would be much better on a message resource level (instead of cluttering up the message itself).
|  | ||
| #### @translate | ||
|  | ||
| _Value:_ `yes` or `no`. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indicate that yes is default?
Is there a reason attributes don't follow a similar structure to functions and their options here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we've agreement that yes is the default. In fact, for expressions, I would think that the general default might in fact be no to indicate that a translator is not expected to make any changes to the expression.
Considering this a bit more, maybe something like translate=input or translate=|input,minimumFractionDigits| would be better? That would indicate which parts are expected to be translatable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value is no when the attribute is not present, but yes when the attribute is present and has no value, right?
I don't like the values yes/no, but they are inherited from XLIFF (and its friends, such as ITS) and we should probably remain consistent with them (for portability at least)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's a slightly different undrstanding of "default" than I'd had -- as in, the value that's applied if the attribute is not present at all.
I don't hate the yes/no as they're relatively legible and are perhaps easier to extend with other enum values than e.g. true/false would be. But as they're already in use by XLIFF, we should use the same values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that requiring explicit values is cleaner.
How hard is it to type =no (3 characters)?
translate=|input,minimumFractionDigits| would be better? > That would indicate which parts are expected to be translatable.
I think that such info does not belong here, it belongs in the function registry.
A while ago I even provided a list of l10n attributes to use for each function option (something like hide, read-only, enum, free-form). I can even think of more options.
|  | ||
| Indicates whether or not the _markup_ and its contents can be re-ordered. | ||
|  | ||
| #### @comment | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just permit the "global" attributes on markup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're repeating attributes defined above. Why not make those like @comment global to both expressions and markup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like an editorial fix we could apply later, if it does hold that the annotations continue to match on expressions and markup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a bad idea for identically-named attributes to diverge. The sets aren't identical, of course.
|  | ||
| #### @max-length | ||
|  | ||
| _Value:_ A strictly positive integer, followed by a space, followed by one of the following: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
digit size option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's limited to max 99, and we need to allow for limits greater than that.
| _Value:_ A strictly positive integer, followed by a space, followed by one of the following: | ||
| - `chars` | ||
| - `bytes` | ||
| - `lines` | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good luck with this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in, we should not include it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Measuring bytes will depend on some character encoding somewhere. Without an indication of the encoding (which this doesn't provide), there is no way to perform the measurement.
(FWIW, you're missing graphemes, which is another measurement (approximately "screen positions", but only approximately so).)
Lines depends on... font, font size, pixel width, line-breaking, hyphenation (insert more here) and are even harder to define that bytes.
Length limitations are a "fact of life" in localization, but badly defined mechanisms for them are not that helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option would be to leave out the units, and to let the implementation figure out what the limit means, something in the overlap of characters/code points/graphemes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't leave it out, because that is exactly the point.
As a developer I know what I need the unit to be.
So I need a way to tell the l10n tool (the linter/checker in that tool)( what I need.
But I might have no control on that tool, because it is a 3rd party tool, often hosted by a vendor.
Even more, 2 devs in the same company and even the same project might need different units.
So if we don't specify the units we might as well not document this at all, because it is useless.
Co-authored-by: Addison Phillips <[email protected]>
| 
 During yesterday's call, @mihnita also expressed concern regarding cluttering up a message with multiple attributes. His thought was that it would often be preferable to attach a  To me, this speaks of a need to have that capability also be well defined, so that it can be ergonomically done across resource formats. In other words, I think we need a JavaDoc-y syntax for message-level attributes. | 
|  | ||
| Empty _messages_ SHOULD be accompanied by an explanatory `@comment`. | ||
|  | ||
| #### @max-length | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a can of worms :-)
One might want two kinds os length limitations:
- 
storage 
 For example if you put the strings a in "traditional" database and you have a max size for the translations. Then you need the encoding of the string.
 So you "max 120 bytes as utf-8"
- 
visual (for example using em)
 That is a can of worms.
 Because "m" is not the same width as "l" :-)
 And "AAAAVVVVV" is not the same width as "AVAVAVAV" (because of kerning).
 And ligatures, and complex script.
 To accurately measure anything you need the exact font, if it is monospaced or not, with the kerning table, ligatures, combining chars, etc.
 Even the font version might affect you.
 Then in some systems you can enable/disable opentype features.
 To measure multi-lines you need the max length of one line, if hyphenation is available, the exact hyphenation data + engine, if justification is set or not :-)
TLDR: I would leave it out for now
|  | ||
| Identify the _functions_ and _markup_ supported by the _message_ formatter. | ||
|  | ||
| #### @source | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really does not belong here!
|  | ||
| Indicates whether the _message_ is translatable or not. | ||
|  | ||
| Some _messages_ may be required to have the same value in all locales. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then they are not messages that should be stored in resource bundles. They can very well be hard-coded.
A better use case is probably to encode info about locale sensitive behavior. For example the fact that the default order for a Contacts app should be first-name, except that Japanese, and a few others should be last name.
But that would not be MF2.
TLDR: I am not sure I see a good use case.
|  | ||
| Some _messages_ may be required to have the same value in all locales. | ||
|  | ||
| #### @version | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've bend in long debates about mechanisms like this one.
It is controversial, so I would leave it out for now.
| @@ -0,0 +1,233 @@ | |||
| ## Expression, Markup, and Message Attributes | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I am not happy with the idea of storing all of this in the message proper.
This belongs in the storage, outside the message.
| I have a couple of questions and would like to share my understanding of the matter. In addition to the message resource standard, I’m considering how it could integrate with an in-context editing or translation tool. Expression AttributesThese attributes may vary by locale, so it makes sense for them to be included within the message: 
 For  Markup AttributesWhat exactly does the  The same with  
 
 Message AttributesI understand the flexibility of having messages with  I haven't had time to think about the other message-related attributes yet, so that's all for now. | 
Adds an initial set of expression, markup, and message attribute definitions.
The proposed attributes are drawn from:
As noted in the text, this is not intended as a final list, but as a starting point. The text is not being currently proposed to be normative, but we could change that later.