[strings-and-bidi] Would language+script information suffice for ambiguous cases? #103

r12a · 2017-08-07T18:51:11Z

Just thinking off the top of my head.

When the article inserts ALM at the start of a numeric string, it does so because it knows that the language is Arabic (and, importantly, not Persian, which uses the same script). Of course the application also needs to know that this is a range, expression or numeric date.

For the other remaining difficulty after isolation has been applied (ie. the titles that start with 'CSS' but should have a base direction of RTL), could we also solve the problem if we know that these strings are arabic, persian and hebrew, and that therefore the expected base direction would be RTL?

Note that sometimes the script subtag may be important eg. az-Cyrl, other times not.

Also, for ISBN numbers and such the language may be irrelevant, but we don't want a RTL direction applied. It seems a bit messy to have to parse the string so as to determine that it is completely numeric. This seems to be a sticky problem...

This approach may lend itself well to situations like the card, where the language is determined in the target destination at a remove from the actual strings (eg. this is an arabic/hebrew/persian card), and only overriden when there is a different language specified for a given string.

r12a · 2017-08-08T11:09:35Z

Ok, so here's where it fails:

Effectively, what i was thinking of can be realised by surrounding the inserted strings with not just <bdi>, but (based on knowledge that the language of the card is Arabic) <bdi dir="rtl">.

Although this would fix the problems with the translated titles in RTL scripts, which isolation alone doesn't, it breaks the japanese and english titles by pushing the exclamation mark to the left.

Note that the exclamation mark at the end sounds trivial, but it also represents what happens with examples such as "The book مغامرة جديدة is good.", where 'the book' and 'is good' get swapped over (not good).

r12a · 2017-08-08T11:15:57Z

If it worked, however, by looking at the language of each individual string, and setting the direction of the <bdi> element (or equivalent) on that basis, maybe that would be better. (However, how often can you expect to have that information available?)

As long as the english title is labelled as 'en' and the arabic title as 'ar', that should produce the right result.

The problem with this approach is that it requires the application to consult a lookup table to determine which direction to apply for each string as it is inserted.

r12a · 2017-08-08T16:48:30Z

Closing this and moving to w3c/string-meta#9, which is more appropriate.

r12a closed this as completed Aug 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[strings-and-bidi] Would language+script information suffice for ambiguous cases? #103

[strings-and-bidi] Would language+script information suffice for ambiguous cases? #103

r12a commented Aug 7, 2017 •

edited

Loading

r12a commented Aug 8, 2017

r12a commented Aug 8, 2017

r12a commented Aug 8, 2017

[strings-and-bidi] Would language+script information suffice for ambiguous cases? #103

[strings-and-bidi] Would language+script information suffice for ambiguous cases? #103

Comments

r12a commented Aug 7, 2017 • edited Loading

r12a commented Aug 8, 2017

r12a commented Aug 8, 2017

r12a commented Aug 8, 2017

r12a commented Aug 7, 2017 •

edited

Loading