Skip to content

Support list handling #36

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
grhoten opened this issue Feb 10, 2020 · 7 comments
Closed

Support list handling #36

grhoten opened this issue Feb 10, 2020 · 7 comments
Labels
functions Issue pertains to the default function set requirements Issues related with MF requirements list

Comments

@grhoten
Copy link
Member

grhoten commented Feb 10, 2020

Is your feature request related to a problem? Please describe.
CLDR provides some list definitions, but it's not really complete. Conjunction or "and" lists are context sensitive in some languages. The same goes for disjunction ("or") and adjective lists.

Describe the solution you'd like
The following needs to be modifiable from the message format with predefined defaults for the standard lists.

  • Before first item
  • After first item (for Japanese lists)
  • Between each item by default. Usually this is a language specific comma
  • Before each item, like "the ", "of " or "in "
  • Inflect each item depending on the given context in a sentence. For example, each item could become definite or indefinite (e.g. an apple, a table, a unicorn and an umbrella).
  • After each item
  • Before last item, which is typically "and" or "or", but the value can be sensitive to the item before the last item or sensitive to the previous item. This is variable for Korean, Spanish, Italian and Hebrew.
  • After the last item (for Chinese and Korean languages)

Describe why your solution should shape the standard
This is what is needed for various languages mentioned above. For example, issue CLDR-13025 highlights one of the issues with the current implementation.

Additional context or examples
Siri does it.

@Fleker
Copy link

Fleker commented Feb 10, 2020

Would all of these params need to be part of the message format? It seems like some aspects like 'After first item' in the case of Japanese could not be included in the syntax and would only need to be used through the format engine.

ie.

{{list <- apple, table, unicorn}}

Do we expect this message format will allow nesting? If so, then we could avoid some of the customization directly in list handling and rely on using indefinite articles like in #31.

{{list <- {{a/an <- apple}}, {{a/an <- table}}, {{a/an <- unicorn}}}}

@grhoten
Copy link
Member Author

grhoten commented Feb 10, 2020

All options are optional and configurable in our message format, but the default implementations are usually used.

Yeah, it's expected to work in conjunction with #31 and #16. #16 is where you're changing from an unspecified definiteness state to a definite or an indefinite state. Oddly enough, the previous sentence that I just wrote highlights when someone could want to use such functionality.

You would need to be able to turn a collection of independent named variables into this list, or you take a variable length array to get this functionality to work. While you could apply the indefinite article to each variable independently, the assumption is that most translators will apply to the list, and each item is applied the same operation. In uncommon circumstances, you could just apply the operation to the first item, but I haven't seen that needed yet.

So if I had a list in Spanish that had the words ["gato", "gata", "gatos", "gatas"], and I applied the definite form to a conjunction list, then it would become "el gato, la gata, los gatos y las gatas". This is a scenario when the translator doesn't know the contents of the array before translation. If I applied the definite state to a conjunction list in Swedish, then the list ["katt", "katter"] would become "katten och katterna". At least that is what my implementation does right now.

@mihnita
Copy link
Collaborator

mihnita commented Feb 14, 2020

We already have list formatters.

But I highly doubt that determining the "proper conjunction" is doable...
There might be a way to enhance a list formatter with this?

So the message format can be something like ...{foo, list, { type:or, style: definite_article} } ...

It moves the "smartness" from the message format to the list formatter.
The message format only tells the list formatter HOW to do the job
(similar to "the date formatter does the work, the message format can ask for a long date")

@grhoten
Copy link
Member Author

grhoten commented Feb 16, 2020

We already have list formatters.

Does we mean CLDR in your context?

But I highly doubt that determining the "proper conjunction" is doable...
There might be a way to enhance a list formatter with this?

This request is the combination of several requests for several languages that are supported by Siri. This proposal is the latest revision that Siri already supports. The conjunction support for Spanish was added to Siri back in 2012.

I think this Siri functionality may predate anything added to CLDR or ICU, and I don’t think they are usable for several languages. I think it’s perfectly doable to implement a more accurate conjunction list.

So the message format can be something like ...{foo, list, { type:or, style: definite_article} } ...

It moves the "smartness" from the message format to the list formatter.
The message format only tells the list formatter HOW to do the job
(similar to "the date formatter does the work, the message format can ask for a long date")

Yeah sort of. Though I’d prefer to be consistent with the inflection support for other variables. It’s not really a style. It’s an inflection. A translator may want to apply the definite state to each item or the definite state and genitive case at the same time. It’s a list of grammemes (grammatical category values) to apply to each item in the list.

@stasm stasm mentioned this issue Feb 17, 2020
@mihnita
Copy link
Collaborator

mihnita commented Feb 17, 2020

Does we mean CLDR in your context?

It means CLDR and ICU
https://unicode-org.github.io/icu-docs/apidoc/released/icu4j//com/ibm/icu/text/ListFormatter.html

The API is not friendly enough.
The default formatter does the "...and..." form (STANDARD).
The ListFormatter.Style also supports OR, UNIT, UNIT_NARROW, and UNIT_SHORT
Badly documented and marked "ICU internal only" (they are not deprecated, but that is the only way to tell and IDE and to Javadoc that you have to be careful when using an API.

The tag used in the sources in the javadoc is @internal


I think that where this belongs can be TBD.

My thinking was to put the "smartness" in the list formatter, and "knobs" that control things in MessageFormat. Kind of similar to DateFormat + skeletons.

Might not be technically doable. I don't know enough.

@mihnita mihnita added the requirements Issues related with MF requirements list label Sep 24, 2020
@aphillips aphillips added the functions Issue pertains to the default function set label Jul 19, 2023
@aphillips
Copy link
Member

Need to consider whether list formatters will be in the default registry.

@aphillips
Copy link
Member

As mentioned in today's telecon (2023-09-18), closing old requirements issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions Issue pertains to the default function set requirements Issues related with MF requirements list
Projects
None yet
Development

No branches or pull requests

4 participants