Skip to content

Separation of print and spoken forms #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
grhoten opened this issue Feb 9, 2020 · 7 comments
Closed

Separation of print and spoken forms #33

grhoten opened this issue Feb 9, 2020 · 7 comments
Labels
requirements Issues related with MF requirements list

Comments

@grhoten
Copy link
Member

grhoten commented Feb 9, 2020

Is your feature request related to a problem? Please describe.
There are times that you need to provide guidance on pronunciation of a given word.

Describe the solution you'd like
It's important to have all text to be printed and spoken by default, but sometimes you need to substitute the printed form because it's too abbreviated. So you need to have mark up to only print a span of text or to only speak a span of text. Instead of repeating the exact same phrase by default, spans of text should be doable. For example, "Turn right onto highway 101" can turn into "Turn right onto highway 101one oh one". Another example is "January 1first" That date example actually requires CLDR's rule based number format (RBNF).

Describe why your solution should shape the standard
Without this ability, pronouncing numbers, dates, abbreviations, hetronyms (same spelling but different pronunciation) and so forth becomes inaccurate.

Additional context or examples
SSML can help in this area. Though sometimes you just need to provide more context. For example a message may be abbreviated for the print form due to limited UI space, but it needs a lot more context if you can't see the context of a message.
Let's Come To An Agreement About Our Words

@Fleker
Copy link

Fleker commented Feb 10, 2020

Should this be part of message format directly, or through some sort of extension based on the key of a message?

For example:

'direction_turn': 'Turn {{direction}} onto {{road_name}}'
'direction_turn_vui': 'Turn {{direction}} onto {{road_name_numbers}}'

If you just markup a small portion of the message, it may not work in many voice cases where you want a simple voice output and a more verbose message or vice-versa.

In a recipe use-case when using voice accompanied by a screen, you may want a simple voice output as the user can read the screen for additional context. In a news use-case, you may want a long voice output while the text is simpler, perhaps showing the headline or a generic statement.

@grhoten
Copy link
Member Author

grhoten commented Feb 10, 2020

I would prefer it to be a part of the message format directly, and its usage should be optional (not explicit). Formatting of variables need to allow this to be taken into account. You frequently need to use all of the same text and then you just want a small span to be spoken a certain way. This is to avoid copypasta where you may have to copy the same thing over and over only to modify one small span of a sentence.

Sometimes the pronunciation is dynamically chosen at runtime. This is a common issue for #34. The pronunciation of the number needs to be in grammatical agreement of the noun. A translator may know the context of the sentence more than a rule based or machine learned based text to speech system that is trying to pronounce the digits instead of words.

If you have the date 1/2/03, how might I pronounce that? SSML may be able to annotate it as a date so that it doesn't sound like a math equation or an address with slashes. Though it's fairly common that it suffers from not knowing if 1/2 is January 2nd or February 1st. If your text to speech (TTS) system is 100% in sync with your regional differences of the printed form, then it "should" work, but my experience is that the supported date formats in CLDR does not match the set of regional dialects of a TTS system.

@Fleker
Copy link

Fleker commented Feb 10, 2020

What would happen in the case where the VUI and display text differ significantly?

@mihnita
Copy link
Collaborator

mihnita commented Feb 14, 2020

Some of the use cases might be helped by something like formatToParts
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/DateTimeFormat/formatToParts

If the placeholders have a clear type associated (...expires on {exp,date,::yMd}...) that then the fact that "2/1/20" is a date is not lost, and we even know which one is the month and which one is the day.

For non-formatable stuff we can still "remember" that a text area comes from a parameter, and we can keep the parameter name.

So Hello {user}! formatted to parts would still know that "John" in the result comes from the parameter "user".

@mihnita mihnita added the requirements Issues related with MF requirements list label Sep 24, 2020
@aphillips
Copy link
Member

@grhoten There doesn't seem to be a proposal attached to this. I could see users creating a selector to choose between pattern strings. Or I could see using expression attributes to control the expansion of placeholders (based on modality). Or perhaps some other mechanism.

I'm closing this issue as part of general cleanup. If you (or others) feel that a mechanism is still needed and that it should be part of the MF2 specification, please open specific issues or (better) write a design doc using our template. Thanks!

@grhoten
Copy link
Member Author

grhoten commented Jan 20, 2024

It's fine to close this. MF2 is meant for GUI only and not easy to adapt for VUI as far as I've seen it. People will be limited to markup like SSML for the time being.

This proposal wasn't trying to use selectors in the way that MF2 has it. Think of SSML as CSS but for a VUI. Though you really want to have the message to be printed and spoken the same way most of the time. So you don't want to copy and paste the entire response. You need to annotate and modify specific segments of a message. For example, you may want to pronounce a number a specific way to agree with the unit that you are quantifying.

My presentation referenced in this issue covers this topic in more detail.

@aphillips
Copy link
Member

@grhoten thanks for the update.

I disagree that MF2 is meant for GUI only. I think it would be possible to adapt MF2 for VUI. A (MF2-syntax) markup implementation of SSML coupled with a formatToParts implementation could go a long way to addressing what you mention here.

// not going to look up the SSML this morning
// note that Highway 101 is "one oh one" but Highway 71 is probably "seventy-one"
// either way, I can get the SSML instructions for how to read the number
// into the GUI formatted string ;-)
{{Turn right onto highway {#ssml:someTag speakAs=number}{$highwayNumber :integer}{/ssml:someTag}}}

//  prints as "January 1st", but formatToParts lets you speak number as "first":
{{Today is {$date :date month=long day=ordinal}.}} 

MF2 is not a perfect solution to the issues you've called out in various places. But it's not so imperfect as to make it unusable either.

I think what this particular issue is calling for might be a selector so that one can vary the pattern altogether (rather than sharing patterns, as I demoed above). E.g:

.match {@mode}
speech {{Hear me!}}
* {{Read me.}}

I think that would be out of scope for LDML45, but a custom implementation would be entirely feasible. Think about it for the beta period. More specifically: think about what specific feature additions would be needed to enable VUI support in MF2 and make design proposals for them in the post-45 period.

Further, I suspect that, if these are VUI specific, they wouldn't belong in core MF 2.0's default registry. They want to be in a standardized VUI add-on. What form would that take?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
requirements Issues related with MF requirements list
Projects
None yet
Development

No branches or pull requests

4 participants