Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[strings-and-bidi] Would language+script information suffice for ambiguous cases? #103

Closed
r12a opened this issue Aug 7, 2017 · 3 comments

Comments

@r12a
Copy link
Contributor

r12a commented Aug 7, 2017

source [en]

Just thinking off the top of my head.

When the article inserts ALM at the start of a numeric string, it does so because it knows that the language is Arabic (and, importantly, not Persian, which uses the same script). Of course the application also needs to know that this is a range, expression or numeric date.

For the other remaining difficulty after isolation has been applied (ie. the titles that start with 'CSS' but should have a base direction of RTL), could we also solve the problem if we know that these strings are arabic, persian and hebrew, and that therefore the expected base direction would be RTL?

Note that sometimes the script subtag may be important eg. az-Cyrl, other times not.

Also, for ISBN numbers and such the language may be irrelevant, but we don't want a RTL direction applied. It seems a bit messy to have to parse the string so as to determine that it is completely numeric. This seems to be a sticky problem...

This approach may lend itself well to situations like the card, where the language is determined in the target destination at a remove from the actual strings (eg. this is an arabic/hebrew/persian card), and only overriden when there is a different language specified for a given string.

@r12a
Copy link
Contributor Author

r12a commented Aug 8, 2017

Ok, so here's where it fails:

Effectively, what i was thinking of can be realised by surrounding the inserted strings with not just <bdi>, but (based on knowledge that the language of the card is Arabic) <bdi dir="rtl">.

Although this would fix the problems with the translated titles in RTL scripts, which isolation alone doesn't, it breaks the japanese and english titles by pushing the exclamation mark to the left.

Note that the exclamation mark at the end sounds trivial, but it also represents what happens with examples such as "The book مغامرة جديدة is good.", where 'the book' and 'is good' get swapped over (not good).

@r12a
Copy link
Contributor Author

r12a commented Aug 8, 2017

If it worked, however, by looking at the language of each individual string, and setting the direction of the <bdi> element (or equivalent) on that basis, maybe that would be better. (However, how often can you expect to have that information available?)

As long as the english title is labelled as 'en' and the arabic title as 'ar', that should produce the right result.

The problem with this approach is that it requires the application to consult a lookup table to determine which direction to apply for each string as it is inserted.

@r12a
Copy link
Contributor Author

r12a commented Aug 8, 2017

Closing this and moving to w3c/string-meta#9, which is more appropriate.

@r12a r12a closed this as completed Aug 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant