Skip to content

Semantic skeletons design #1067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Changes from 8 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
5180718
Semantic skeletons design
aphillips Apr 6, 2025
239b858
Add links
aphillips Apr 6, 2025
20f623a
Explain picture strings better
aphillips Apr 21, 2025
676f6c9
Update exploration/semantic-skeletons.md
aphillips Apr 22, 2025
289739f
Replace picture output with a table
aphillips Apr 22, 2025
e70ad8a
Fix objective section
aphillips Apr 22, 2025
c5cf831
Update semantic-skeletons.md
aphillips Apr 22, 2025
902eab7
Fix requirements per comments
aphillips Apr 22, 2025
f5de049
Update semantic-skeletons.md
aphillips Apr 22, 2025
d2a16a6
Update exploration/semantic-skeletons.md
aphillips Apr 22, 2025
941405b
Update exploration/semantic-skeletons.md
aphillips Apr 22, 2025
5028972
Update exploration/semantic-skeletons.md
aphillips Apr 22, 2025
10bb70d
clarify the picture string table
aphillips Apr 22, 2025
108e5f3
Update exploration/semantic-skeletons.md
aphillips Apr 23, 2025
9d326ec
Add FAQ, improve option bag skeleton description
aphillips Apr 23, 2025
b8dabd2
An apostrophe escaped notice
aphillips Apr 23, 2025
3ea7e4d
Update exploration/semantic-skeletons.md
aphillips Apr 23, 2025
0f36d02
Adding use cases and descriptions of types
aphillips Apr 25, 2025
1fb130b
Add an alternative design
aphillips Apr 27, 2025
afd0001
Update exploration/semantic-skeletons.md
aphillips Apr 29, 2025
a9440d5
Update exploration/semantic-skeletons.md
aphillips Apr 29, 2025
ff58cc3
Update exploration/semantic-skeletons.md
aphillips Apr 29, 2025
937c04b
Make `Intl.DateTimeFormat` present tense
aphillips Apr 29, 2025
1beea81
typo
aphillips May 2, 2025
be26724
Add @eemeli's example
aphillips May 2, 2025
7cf129a
Apply suggestion manually (time zone/offset discussion)
aphillips May 5, 2025
754c53f
Update exploration/semantic-skeletons.md
aphillips May 5, 2025
9f84cba
Start work on 2025-05-12 discussion
aphillips May 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
209 changes: 209 additions & 0 deletions exploration/semantic-skeletons.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
# Semantic Skeletons Design

Status: **Proposed**

<details>
<summary>Metadata</summary>
<dl>
<dt>Contributors</dt>
<dd>@sffc</dd>
<dd>@aphillips</dd>
<dt>First proposed</dt>
<dd>2024-04-06</dd>
<dt>Pull Requests</dt>
<dd><a href="https://github.com/unicode-org/message-format-wg/pull/1067">#1067</a></dd>
</dl>
</details>

## Objective

_What is this proposal trying to achieve?_

### Provide support for formatting date/time values using semantic skeletons

"Semantic skeletons" are a method introduced in CLDR 46 for programmatically selecting a datetime pattern for formatting.
There is a fixed set of acceptable semantic skeletons.

Previously, ICU MessageFormat provided support for "classical skeletons",
using a microsyntax derived from familiar picture strings (see below)
combined with code in ICU (`DateTimePatternGenerator`) to produce the desired date/time value format.
`Intl.DateTimeFormat` uses "option bags" to provide a similar capability.
A classical skeleton allowed users to express the desired fields and field widths in a formatted date/time value.
The runtime uses locale data to determine minutiae such as
field-order,
separators,
spacing,
field-length,
etc. to produce the desired output.

Advantages of semantic skeletons over classical skeletons:

- Provides all and only those combinations that make sense
- Allows for more efficient implementation, since there is no need to support "crazy" combinations like "month-hour"
- Allows for a more clear, ergonomic placeholder syntax, since the number of options can be limited
- Easier for user experience designers to specify, developers to implement, and translators to interpret

### Avoid 'picture strings'

The MFWG early on considered including support for "picture strings" in the formatting of date/time values.
There is a Working Group consensus **_not_** to support picture strings in Unicode MessageFormat, if possible.
Many date/time formatting regimes provide for "picture strings".
A "picture string" is a pattern using a microsyntax in which the user (developer, translator, UX designer)
exactly specifies the desired format of the date/time value.
In a picture string, separators, spaces, and other formatting are explicitly specified.
This provides a lot of power to the devleoper or user experience designer, in terms of specifying formatting.
For example: `MMM dd, yyyy` or `yyyy-dd-MM'T'HH:mm:ss`

Picture strings require translators to interact with and "translate" the picture string
which is embedded into the _placeholder_ in order to get appropriately localized output.
For example, in MF1 you might see: `Today is {myDate,date,MMM dd, yyyy}`

Translating picture strings can result in non-functional messages.
The exotic microsyntax can be unfamiliar to translators, as it is designed for developers.
Unlike "picture strings", skeletons (classical or semantic) do not require the translator or
developer to alter them for each locale or to know about the specifics,
such as spaces or separators in each locale.

Here are some picture strings with their output vs. common skeletons:

| Picture String | Locale | Output | Skeleton yMMMd | Skeleton yMMd |
|---|---|---|---|---|
|MMM dd, yyyy| en-US | Apr 22, 2025| Apr 22, 2025| 04/22/2025|
| | fr-FR | avr. 22, 2025| 22 avr. 2025| 22/04/2025|
| | ja-JP | 4月 22, 2025| 2025年4月22日| 2025/04/22|
|dd MMM, yyyy| en-US | 22 Apr, 2025| Apr 22, 2025| 04/22/2025|
| | fr-FR | 22 avr., 2025| 22 avr. 2025| 22/04/2025|
| | ja-JP | 22 4月, 2025| 2025年4月22日| 2025/04/22|
|MM/dd/yyyy| en-US | 04/22/2025| Apr 22, 2025| 04/22/2025|
| | fr-FR | 04/22/2025| 22 avr. 2025| 22/04/2025|
| | ja-JP | 04/22/2025| 2025年4月22日| 2025/04/22|
|dd-MM-yyyy| en-US | 22-04-2025| Apr 22, 2025| 04/22/2025|
| | fr-FR | 22-04-2025| 22 avr. 2025| 22/04/2025|
| | ja-JP | 22-04-2025| 2025年4月22日| 2025/04/22|

## Background

_What context is helpful to understand this proposal?_

Links:
- [Semantic Skeletons Specification](https://unicode.org/reports/tr35/tr35-dates.html#Semantic_Skeletons)
- [ICU4X Field Set Enum \(strongly typed\)](https://unicode-org.github.io/icu4x/rustdoc/icu/datetime/fieldsets/enums/enum.CompositeFieldSet.html)
- [ICU4X Field Set Builder \(more JS-like\)](https://unicode-org.github.io/icu4x/rustdoc/icu/datetime/fieldsets/builder/struct.FieldSetBuilder.html)


Semantic skeletons are not the first attempt to provide this functionality.
Previous skeleton mechanisms ("classical skeletons") used
collections of field options (as in `Intl.DateTimeFormat`)
or a microsyntax (as in ICU4J).

The `Intl.DateTimeFormat` skeletons consist of "option bags"
such as `{ year: "numeric", month: "short", day: "numeric" }`
in which the user specifies the field and its width.
Only fields appearing in the options appear in the formatted date/time value.

The ICU MessageFormat "classical skeleton" microsyntax uses strings supplied by the developers.
These strings specify the fields and field lengths that should appear in the formatted value.
See [here](https://unicode-org.github.io/icu/userguide/format_parse/datetime/#date-field-symbol-table)
The system then uses the string to perform date/time pattern generation,
arranging the specified fields in the correct order,
selecting locale-appropriate separators,
and producing a "picture string" that can be consumed by date/time formatters
such as `java.text.SimpleDateFormat`.

## Use-Cases

_What use-cases do we see? Ideally, quote concrete examples._

As a developer, I want to format date, time, or date/time values to show specific fields
with specific appearance without having to learn a complex microsyntax
or modify my code or formatting directions for each locale.

As a translator, I want to understand what output a given date/time placeholder will produce
in my language.

As a translator, I don't want to have to "translate" or modify a date/time placeholder to suit
my language's needs.
I should trust that the placeholder will produce appropriate results for my language.

## Requirements

_What properties does the solution have to manifest to enable the use-cases above?_

1. It should be possible to format operands consisting of locally-relevant date/time types, including:
- Temporal values such as `java.time` or JS `Temporal` values,
- incremental time types ("timestamps")
(e.g. milliseconds since epoch times such as `java.util.Date`, `time_t`, JS `Date`, etc.),
- field-based time types
(e.g. those that contain seperate values per field type in a date/time, such as a year-month),
- [floating time](https://www.w3.org/TR/timezone/#dfn-floating-time) values
(e.g. those that are not tied to a specific time zone, variously called local/plain/civil times),
- or other local exotica (Java `Calendar`, C `tm` struct, etc.)
1. Date/time formatters should not permit users to format fields that don't exist in the value
(e.g. the "month" of a time, the "hour" of a date)
1. Date/time formatters should not permit users to format bad combinations of fields
(e.g. `MMMMmm` (month-minute), `yyyyjm` (year-hour-minute), etc.)
1. Date/time formatters should permit users to specify the desired width of indvidual fields
in a manner similar to classical skeletons,
while relying on locale data to prevent undesirable results.
Comment on lines +344 to +346
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to be more clear about what is "required" versus "wanted".

I think "required" is that users should specify an overall length, and "wanted" to hint at the width of an individual field independently of the overall length.

For example:
| Classical Skeleton | `en-US` Output |
|---|---|
| `yMd` | 04/06/2025 |
| `yMMMd` | Apr 6, 2025 |
| `yMMMMd` | April 6, 2025 |
1. Developers, translators, and UI designers should not have to learn
multiple new microsyntaxes or multiple different sets of options for date/time value formatting.
1. Any microsyntax or option set specified should be easy to understand only from the placeholder.
1. Any microsyntax or option set specified should not _require_ translators to alter the values in most or all locales.

## Constraints

_What prior decisions and existing conditions limit the possible design?_

## Proposed Design

_Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._

## Alternatives Considered

_What other solutions are available?_
_How do they compare against the requirements?_
_What other properties they have?_


### Design: Use Option Naming

In this section, we use a scheme similar to `FieldSetBuilder` linked earlier.

#### DateTime fields

Options:

```
{$date :datetime dateFields="YMD"}
{$date :datetime date="YMD"}
{$date :datetime fields="YMD"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are dateFields, date and fields just different possible names for the same option? Or do they mean different things? (Same question about timePrecision vs. time below.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different possible names for the same option.

```

#### TimePrecision

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a little more explanation (it should be possible to follow this doc without reading the other linked-to docs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the spec is basically empty and in need of work. I started to noodle around with it today. Each one of the options will need quasi-complete descriptions in order to see how they'll work and the relative usability of each.

Options:
```
{$date :datetime timePrecision="minute"}
{$date :datetime time="minute"}
```
(TODO: Add others)

### Design: Use Separate Functions

Some choices:

1. A single :datetime function
1. Pro: All in one place
2. Con: More combinations of options that form invalid skeletons
2. :date, :time, and :datetime
1. Pro: More tailored and type-safe
2. Con: Not fully type-safe
3. :date, :time, :datetime, :zoneddatetime, *maybe* :zoneddate, :zonedtime, :timezone
1. Pro: Most type-safe
2. Con: Lots of functions