Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
233 changes: 233 additions & 0 deletions spec/attributes/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
## Expression, Markup, and Message Attributes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I am not happy with the idea of storing all of this in the message proper.
This belongs in the storage, outside the message.


> [!IMPORTANT]
> This part of the specification is under active development,
> and is non-normative.

The Unicode MessageFormat syntax and data model allow for _attributes_
to be defined on _expressions_ and _markup_.
These MUST NOT have any impact on the formatting of a message,
and are intended to inform users, such as translators, and tools
about the specific _expressions_ or _markup_ to which they are attached.
_Attributes_ MAY be stripped from _expressions_ and _markup_
with no effect on the message's formatting.

While the specification does not define how an _attribute_ could be attached
to the _message_ as a whole,
this SHOULD be provided for by a resource container for Unicode MessageFormat messages.

As all _attributes_ with _reserved identifiers_ are reserved,
definitions are provided here for common _attribute_ use cases.
Use a _custom identifier_ for other (custom) _attributes_,
preferably one with an appropriate _namespace_.

### Attribute Values

_Attributes_ are not required to have a value.
For _attributes_ defined here that explicitly support `yes` as a value,
an _attribute_ with no value is considered synonymous
with the same _attribute_ with the value `yes`.

### Expression Attributes

#### @comment

_Value_: A non-empty string.

Associates a freeform comment with the _expression_.

> For example:
>
> ```
> The {$device @comment=|Possible values: Printer or Stacker|} has been enabled.
> ```

#### @example

_Value_: A non-empty string.

An example of the value the _expression_ might take.

> For example:
>
> ```
> Error: {$details @example=|Failed to fetch RSS feed.|}
> ```

#### @term

_Value_: A non-empty string, or a URI.

Identifies a well-defined term.
The value may be a short definition of the term,
or a URI pointing to such a definition.

> For example:
>
> ```
> He saw his {|doppelgänger| @term=|https://en.wikipedia.org/wiki/Doppelg%C3%A4nger|}.
> ```

#### @translate

_Value:_ `yes` or `no`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indicate that yes is default?

Is there a reason attributes don't follow a similar structure to functions and their options here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we've agreement that yes is the default. In fact, for expressions, I would think that the general default might in fact be no to indicate that a translator is not expected to make any changes to the expression.

Considering this a bit more, maybe something like translate=input or translate=|input,minimumFractionDigits| would be better? That would indicate which parts are expected to be translatable.

Copy link
Member

@aphillips aphillips Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value is no when the attribute is not present, but yes when the attribute is present and has no value, right?

I don't like the values yes/no, but they are inherited from XLIFF (and its friends, such as ITS) and we should probably remain consistent with them (for portability at least)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a slightly different undrstanding of "default" than I'd had -- as in, the value that's applied if the attribute is not present at all.

I don't hate the yes/no as they're relatively legible and are perhaps easier to extend with other enum values than e.g. true/false would be. But as they're already in use by XLIFF, we should use the same values.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that requiring explicit values is cleaner.
How hard is it to type =no (3 characters)?


translate=|input,minimumFractionDigits| would be better? > That would indicate which parts are expected to be translatable.

I think that such info does not belong here, it belongs in the function registry.

A while ago I even provided a list of l10n attributes to use for each function option (something like hide, read-only, enum, free-form). I can even think of more options.


Indicates whether the _expression_ is translatable or not.

> For example:
>
> ```
> He saw his {|doppelgänger| @translate=no}.
> ```

### Markup Attributes

#### @can-copy

_Value:_ `yes` or `no`.

Indicates whether or not the _markup_ and its contents can be copied.

> For example:
>
> ```
> Have a {#span @can-copy}great and wonderful{/span @can-copy} birthday!
> ```

#### @can-delete

_Value:_ `yes` or `no`.

Indicates whether or not the _markup_ and its contents can be deleted.

#### @can-overlap

_Value:_ `yes` or `no`.

Indicates whether or not the _markup_ and its contents where this _attribute_ is used
can enclose partial _markup_
(i.e. a _markup-open_ without its corresponding _markup-end_,
or a _markup-end_ without its corresponding _markup-start_).

#### #can-reorder

_Value:_ `yes` or `no`.

Indicates whether or not the _markup_ and its contents can be re-ordered.

#### @comment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just permit the "global" attributes on markup?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this means.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're repeating attributes defined above. Why not make those like @comment global to both expressions and markup?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like an editorial fix we could apply later, if it does hold that the annotations continue to match on expressions and markup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a bad idea for identically-named attributes to diverge. The sets aren't identical, of course.


_Value_: A non-empty string.

Associates a freeform comment with the _markup_.

> For example:
>
> ```
> Click {#link @comment=|Rendered as a button|}here{/link} to continue.
> ```

#### @term

_Value_: A non-empty string, or a URI.

Identifies a well-defined term.
The value may be a short definition of the term,
or a URI pointing to such a definition.

> For example:
>
> ```
> He saw his {#span @term=|https://en.wikipedia.org/wiki/Doppelg%C3%A4nger|}doppelgänger{/span}.
> ```

#### @translate

_Value:_ `yes` or `no`.

Indicates whether the _markup_ and its contents are translatable or not.

> For example:
>
> ```
> He saw his {#span @translate=no}doppelgänger{/span}.
> ```

### Message Attributes

#### @allow-empty

_Value:_ `yes` or `no`.

Explicitly mark a message with an empty _pattern_ as valid.

Most empty messages are mistakes,
so being able to mark ones that can be empty is useful.

Empty _messages_ SHOULD be accompanied by an explanatory `@comment`.

#### @max-length
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a can of worms :-)

One might want two kinds os length limitations:

  • storage
    For example if you put the strings a in "traditional" database and you have a max size for the translations. Then you need the encoding of the string.
    So you "max 120 bytes as utf-8"

  • visual (for example using em)
    That is a can of worms.
    Because "m" is not the same width as "l" :-)
    And "AAAAVVVVV" is not the same width as "AVAVAVAV" (because of kerning).
    And ligatures, and complex script.
    To accurately measure anything you need the exact font, if it is monospaced or not, with the kerning table, ligatures, combining chars, etc.
    Even the font version might affect you.
    Then in some systems you can enable/disable opentype features.
    To measure multi-lines you need the max length of one line, if hyphenation is available, the exact hyphenation data + engine, if justification is set or not :-)


TLDR: I would leave it out for now


_Value:_ A strictly positive integer, followed by a space, followed by one of the following:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

digit size option?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's limited to max 99, and we need to allow for limits greater than that.

- `chars`
- `lines`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good luck with this one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in, we should not include it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Measuring bytes will depend on some character encoding somewhere. Without an indication of the encoding (which this doesn't provide), there is no way to perform the measurement.

(FWIW, you're missing graphemes, which is another measurement (approximately "screen positions", but only approximately so).)

Lines depends on... font, font size, pixel width, line-breaking, hyphenation (insert more here) and are even harder to define that bytes.

Length limitations are a "fact of life" in localization, but badly defined mechanisms for them are not that helpful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option would be to leave out the units, and to let the implementation figure out what the limit means, something in the overlap of characters/code points/graphemes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't leave it out, because that is exactly the point.

As a developer I know what I need the unit to be.
So I need a way to tell the l10n tool (the linter/checker in that tool)( what I need.
But I might have no control on that tool, because it is a 3rd party tool, often hosted by a vendor.
Even more, 2 devs in the same company and even the same project might need different units.

So if we don't specify the units we might as well not document this at all, because it is useless.


Limits the length of a _message_.

#### @obsolete

_Value:_ `yes` or `no`.

Explicitly mark a _message_ as obsolete.

This might be used in workflows where messages are not immediately removed
when they are no longer referenced by code,
but kept in to support patch releases for previous versions.
During translation, this can be used to de-prioritize such messages.

> [!NOTE]
> The value could include a way to note some version or timestamp when the removal happened,
> or be paired with a second `@removed-in` or similar tag.

#### @param

_Value_: **TBD**

Documents a _variable_.

> [!NOTE]
> Having a well-defined structure for this attribute is pretty important,
> at least to identify the variable its description is pertaining to.
> In addition to describing the variable in words, it could include:
> - The variable's type -- is it a string, a number, something else?
> - A default example value to use for the variable.

#### @schema

_Value:_ A valid URI.

Identify the _functions_ and _markup_ supported by the _message_ formatter.

#### @source
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really does not belong here!


_Value_: A string.

Provides the _message_ in its source locale.

#### @translate

_Value:_ `yes` or `no`

Indicates whether the _message_ is translatable or not.

Some _messages_ may be required to have the same value in all locales.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then they are not messages that should be stored in resource bundles. They can very well be hard-coded.

A better use case is probably to encode info about locale sensitive behavior. For example the fact that the default order for a Contacts app should be first-name, except that Japanese, and a few others should be last name.

But that would not be MF2.

TLDR: I am not sure I see a good use case.


#### @version
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've bend in long debates about mechanisms like this one.
It is controversial, so I would leave it out for now.


_Value_: A string.

Explicitly versions a source string.

This allows for differentiating typo fixes from actual changes in message contents.
The (message id, version) tuple can be used by tooling instead of just the message id
to uniquely identify a message and its translations.