Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix typos, plural logic proposal #23

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

axelmm
Copy link

@axelmm axelmm commented Mar 25, 2016

nice job,

what with incomplete translations - fallback?

permutations don't scale well
option for 'simple' rules for plurals? IT'S VERY IMPORTANT and below is 'simple complex' example hard to implement this way

<PL> - special 'marker'/tag for PLural, if last on string, can be left not closed </PL>

// simple ranges
$FEW_MANY_PLURAL, one<PL,0>no<PL,2>two<PL,3>a few<PL,10>many</PL> // default=1; range 3-9 a few; more than 9 - many

$CAT, cat<PL,0,2>cats </PL> // default, no cats, >2 cats

// polish - more complex but ... quite simple ?
$CAT, kot<PL,0,5+>kotów<PL,2,22,23,24,?2,?3,?4>koty</PL>
// 2,3,4 koty; 5,6,7..11,12...21 kotów; 22,23,24.. koty; 25..101 kotów; 103 koty; [111,212... kotów] would be broken in this case; 222 koty;
// each number (>=2) OPENS RANGE (2-4) but followed by "+" (5+) defines DEFAULT FOR ALL GREATER, following rules with greater value are ONLY SINGLE VALUE axceptions (next "+"rule only changes greater_default) - can be harder to implement (2 pases?)
// for start (if it looks to complex) it can be w/o ranges, only defaults and exceptions (order matters)
// these rules would be then:
$CAT, kot<PL,0,5+,?11,?12,?13,?14>kotów<PL,2,3,4,?2,?3,?4>koty</PL> // ? means optional, any or no number
$CAT, kot<PL,0,5+,?11,?12,?13,?14>kotów<PL,?2,?3,?4>koty</PL> // shorter, because ? matches 2,3,4, 22,23,24 except 11,12,..111,112 - order matters -- rules preinterpreted and stored as 'firewall chain'/array - first match return? f.e. 6 as falling in range_default rule gets value but not returns, trying next rules, if none matches returns this default; 12 matches 5+ range but flows into next rules .. ?12 SINGLE exception whitch returns immediately (not trying to apply next matching ?2 )
usage with OPTIONAL parameter for get()
count = 12; plur_string = ft.get("$FEW_MANY_PLURAL",$count); cat_string = ft.get("$CAT",$count);
// then use them with replace

Verified

This commit was signed with the committer’s verified signature.
what with incomplete translations - fallback?

permutations don't scale well
option for 'simple' rules for plurals? IT'S VERY IMPORTANT and below is 'simple complex' example hard to implement this way
<PL> - special 'marker'/tag for PLural, if last on string, can be left not closed </PL>

// simple renges
$FEW_MANY_PLURAL, one<PL,0>no<PL,2>two<PL,3>a few<PL,10>many</PL>    // default=1;  range 3-9 a few;  more than 9 - many

$CAT, cat<PL,0,2>cats </PL>  // default, no cats, >2 cats

// polish - more complex but ... quite simple ?
$CAT, kot<PL,0,5+>kotów<PL,2,22,23,24,?2,?3,?4>koty</PL>
// 2,3,4 koty; 5,6,7..11,12...21 kotów; 22,23,24.. koty; 25..101 kotów; 103 koty; [111,212... kotów] would be broken in this case; 222 koty;
// each number (>=2) OPENS RANGE (2-4) but followed by "+" (5+) defines DEFAULT FOR ALL GREATER, following rules with greater value are ONLY SINGLE VALUE axceptions (next "+"rule only changes greater_default)   - can be harder to implement (2 pases?)
// for start (if it looks to complex) it can be w/o ranges, only defaults and exceptions (order matters)
// these rules would be then:
$CAT, kot<PL,0,5+,?11,?12,?13,?14>kotów<PL,2,3,4,?2,?3,?4>koty</PL>   // ? means optional, any or no number
$CAT, kot<PL,0,5+,?11,?12,?13,?14>kotów<PL,?2,?3,?4>koty</PL>  // shorter, because ? matches 2,3,4, 22,23,24 except 11,12,..111,112 - order matters -- rules preinterpreted and stored as 'firewall chain'/array - first match return? f.e. 6 as falling in range_default rule gets value but not returns, trying next rules, if none matches returns this default; 12 matches 5+ range but flows into next rules .. ?12 SINGLE exception whitch returns immediately (not trying to apply next matching ?2 )
usage with OPTIONAL parameter for get()
count = 12;
plur_string = ft.get("$FEW_MANY_PLURAL",$count);
cat_string = ft.get("$CAT",$count);
// then use them with replace
@And-0
Copy link

And-0 commented Aug 1, 2016

Hey @axelmm , I've tried implementing your request in my fork here. The explanation is in the Readme. If you could check it out and let me know whether it covers your needs, that would be great! If you like it I can make a pull request here.

@axelmm
Copy link
Author

axelmm commented Aug 1, 2016

Hi,
I found much better option based on php Zend/Symfony solution:
https://github.com/symfony/translation/blob/master/PluralizationRules.php

quick POC in haxe: http://try.haxe.org/#1E141
it's safer - code for rules/ranges are separated from translations, translator only have to know how many options he has to provide between curly brackets, compiler can check this for missing ones...

@larsiusprime
Copy link
Owner

Hey guys, thanks for these contributions, sorry I never responded to the original comment. When I have some time I'll see if I can reconcile/merge everything.

@larsiusprime
Copy link
Owner

@axelmm, @And-0, do either of the languages you've been considering also have to deal with gender and case as well as number(plurality?) I wonder whether the solutions you've offered above are able to deal with it with some tweaking, or if a completely different approach needs to be taken.

@And-0
Copy link

And-0 commented Aug 2, 2016

@axelmm Thanks for the links, I'll check them out!
@larsiusprime Yes, gender and case are a factor there in many languages. However, my solution (I haven't looked at @axelmm's yet) basically allows the translator to input whatever form would be correct in the given context; the programmer (and library) don't have to think about any of it.

However, there might potentially be languages where several parts of the sentence change depending on the number and not just the word after the value; I don't really know. That's not the case in any of the languages I'm familiar with and I don't think I've seen any features catering to that in other loca libraries, but you know...there's a lot of languages out there.

@larsiusprime
Copy link
Owner

larsiusprime commented Aug 2, 2016

Okay, let's start with a simple language where plurals depend on two variables as a general case. Spanish -- numerical adjectives have to agree with the nouns they modify in both NUMBER and GENDER.

So "gato" is "male cat" and "gata" is "female cat."

English:

One cat. Two cats. Three cats.

Spanish:

Un gato. Dos gatos. Tres gatos.
Una gata. Dos gatas. Tres gatas.

So we could take your bracket-syntax and think of it like this:
$X_CATS <X> {1/m:gato, n/m:gatos, 1/f:gata, n/f:gatas}

The {1/m:gato, n/m:gatos, 1/f:gata, n/f:gatas} would be split apart first by commas, then by colon, then by slashes:

(1,m)-->gato Number = 1, Gender = male
(n,m)-->gatos Number = n, Gender = male
(1,f)-->gata Number = 1, Gender = female
(n,f)-->gatas Number = n, Gender = female

In English it would simply be:
$X_CATS <X> cats

For some extremely declined languages with billions of cases and special exceptions you could thus encode highly complex rules. For instance, here's a fictional example where the declention of the simple word "cat" depends not only on the number of cats, but also the gender of the cats, the formality of the speech, and the relative difference in social status between the speakers. And for fun's sake let's say that these interact in highly complex ways that the entire way the word is spelled could be entirely dependent on a specific permutation of all those variables.

$X_CATS <X> {1/m/i/2:schmeerp, n/n/f/0:schmeerpszku, 3/f/s/D:schmeerkfasku}

(1,m,i,2)-->schmeerp Number = 1, Gender = male, Informal case, Addressed has social status + 2 compared to speaker
(n,n,f,0)-->schmeerpszku Number = n, Gender = neuter, Formal case, Addressed has equal social status to speaker
(3,f,s,D)-->schmeerkfasku Number = 3, Gender = female, Semi-formal case, Addressed is literally a divine being -- a (or The) God.

So this syntax does seem to give us some way of handling this kind of exploding complexity and offload it to the translator. The tricky bit is that both the programmer and the translator would have to be aware of the relevant metadata and set it up somehow.

So, purposefully not using any existing function in the firetongue API, it'd be something like:

var x:Int = 1;
var metadata = 
[
    {number:x},
    {gender:"m"},
    {formality:"i"},
    {socialStatusDelta:2}
]
var text = GetAndReplaceTextSomehow("$X_CATS",["<X>"],metadata);   //"1 schmeerp"

I think this would cover just about all the possible bases one could have and make it generically extensible, but it's probably overkill for most usage.

That said, to hit Spanish you'd at least need something like this:

var x:Int = 1;
var metadata = 
[
    {number:x},
    {gender:"m"}
]
var text = GetAndReplaceTextSomehow("$X_CATS",["<X>"],metadata);  //"1 gato"

And in English you'd only need this:

var x:Int = 1;
var text = GetAndReplaceTextSomehow("$X_CATS",["<X>"],[{number:x}]);  //"1 cat"

The more complex insane case collapses transparently in the other languages that don't need as many different variables to consider, and the translator can still provide the full range of possibilities without having to add tons of different lines.

However, we're starting to engineer a private programming language here in the x/y/z syntax, and this solution DOES require the programmer to be kind of intimately involved in the localization process in passing metadata in with the localization fetch function. Not sure if that's inevitable or not.

In any case, what do you think? Am I barking up the wrong tree?

@FelipeMercader
Copy link

Hi,
I will try to offer my point of view as translator. My understanding of programming languages is limited but I have certain notions of how Firetongue works and I will try to explain the challenges I often encounter.

Concerning this part you mention here (However, there might potentially be languages where several parts of the sentence change depending on the number and not just the word after the value), yes, you are right, you have to adapt the genre and the number not only for the adjectives or the nouns, but also for other words like articles and even verbs.

Most common problems I find:
String_1_action: You found a
String_1_item_colour: red
String_1_item_1: shield
String_1_item_2: sword

You found a red shield
You found a red sword

When it comes to translation, a very simple string like this one may cause many problems we have to face. Let’s see:

If the game takes these splitted parts of the same sentence and joins them in a single sentence, it will work nicely in English. Taking into account that we have translated it in Spanish, it will become this mess:

String_1_action: Has encontrado un
String_1_item_colour: rojo
String_1_item_1: escudo
String_1_item_2: espada

Has encontrado un rojo escudo (WRONG)
Has encontrado un rojo espada (WRONG)

-First problem: the word “a” is integrated in the verb string. If I can translate it only once (since String_1_action would be a fixed string), I cannot set the genre of the item that would follow. The article would be correct for the shield line but not for the sword line.

-Second problem: word order. The adjective must go behind the name in this case (most of the time it will be like that, needing the adjective after the noun/item. In Spanish, it could go before for stylistic purposes in certain texts; in this case, having the adjective before is a grammatical error).

Third problem: Genre of adjective. The same example as in first problem, I cannot set the gender of the word red depending on the item, so I am stuck with a fixed gendered adjective which will only work for the first case.

Corrected sentences would be:

Has encontrado un escudo rojo (Word order changed)
Has encontrado una espada roja (Word order changed, adjective gender changed, article gender changed)

We can add more variations to these examples just changing the gender and the number:
a is translated as “uno”, “una”, “unos” and “unas” (nm, nf, nmpl, nfpl)
red is translated as “rojo”, “roja”, “rojos” and “rojas” (nm, nf, nmpl, nfpl)

Another issues in verbs
In some translations, I also found variables in verbs in order to insert a placeholder verb depending on the action. Verbs also change in several languages depending on the subject (not only for the third person like in English, but for every single pronoun I, you, he/she/it, we, you, they.

String 1: The merchant (insert_verb_for_buy) the helmet
String 2: You (insert_verb_for_buy) the helmet

In both cases, English can display the same verb:

The merchant purchased* the helmet
You purchased* the helmet

BUT, Spanish verb will be different depending on the pronoun
El mercader compró* el yelmo
Tú compraste* el yelmo

So, it is clear that I cannot have a fixed single string when it comes to translate “purchased”.

More examples
The merchant sold* the helmet
You sold* the helmet

El mercader *(ha vendido) el yelmo
Tú *(has vendido) el yelmo

Hope it helps a bit!

@And-0
Copy link

And-0 commented Aug 2, 2016

@larsiusprime That's a very powerful system that would probably be too much for the average translator (and annoy programmers). The thing is, I'm not really convinced we actually need to encode all this data. Metadata like the gender of the thing being discussed and the social relation between the speakers is usually known and constant for that particular string. I mean, I guess you could create a generic <GREETING>, <PERSON>! string and then use it to dynamically create everything from Sup, brah! to Greetings, my Liege!, but it's much more sensible to simply have two separate strings for that.

However, @FelipeMercader brought up a good point. If several parts of the string are variables, things become ugly rather quick. If you have You found a <COLOR> <OBJECT> then you do need metadata to account for all possible combinations. Then again, games usually circumvent that by structuring stings likeYou found: Red Sword x2, thereby eliminating the need to match gender, number, case etc. (and I feel it looks cleaner too).

So I guess the question is: Would all the effort of implementing such a system even be worth it? I'm not sure.

@axelmm
Copy link
Author

axelmm commented Aug 2, 2016

This domain is much complex than you think ;)
I'm contributor and proofreader for polish translations of ORO CRM (and ORO Platform) on Crowdin Localization Management Platform (free for Open Source projects). In short it simply manages lines with universal syntax "to_translate: translated_option_1|translated_option_2".

Working with this I learnt many things.

Gender IS NO PROBLEM CAN BE A PROBLEM

  • as programmer you should know if you have female cat actors on scene ;)
  • in this case, if it matters for CONTEXT you just choose $X_FEMALE_CATS (or $X_MALE_CATS) over $X_CATS
  • and provide {kot, koty, kotów} and {kotka, kotki, kotek}

As @FelipeMercader wrote PROBLEM IS when adjective gender or article gender changes.
I think it is no programmer problem to define gender of object but translating system to react when gender changes.
sword (masculine) can be either miecz {m} or szpada {f} - depends of context/usual usage etc.
http://en.bab.la/dictionary/english-polish/sword
Source gender marks or only targets gender change tags? If article gender changes adjective gender is adjusted - some kind of priority?

CONTEXT (and style) is a problem

There is one apple.
There are 5 apples.
Nie było tu żadnego kota. // no cats was there
Był tu 1 kot.
Były tu 2 koty.
Było tu 6 kotów, w tym 5 kotek (female).

Symfony manages this with template syntax:
http://symfony.com/doc/current/translation.html#pluralization
http://symfony.com/doc/current/components/translation/usage.html#component-translation-pluralization

In ORO translations are EXTRACTED from templates using CLI - creating files with "TAGS"
whitch can later in Crowdin look like:
"{0} No entities were deleted|{1} One entity was deleted|]1,Inf[ %count% entities were deleted: {0} Nie usunięto żadnych elementów|{1} Jeden element został usunięty|] 1, Inf [Usunięto %count% elementy/-ów"
oro.grid.mass_action.delete.success_message: '{0} Nie usunięto żadnych elementów|{1} Jeden element został usunięty|] 1, Inf [Usunięto %count% elementy/-ów'
// text '{0} No entities were deleted|{1} One entity was deleted|]1,Inf[ %count% entities were deleted'
// WAS/IS displayed in Crowdin as source from English file from the same key(id or to be more accurate tree structured path)

  • colon ":" separates to_translate from target_translated;
  • to_translate contains options rules, target contains it either;
  • to_translate can contain more than one variables %count%;
    As you can see this is no ideal solution, "not finished" - contains "elementy/-ów" for more than 1.

I woud like to see there a mixed / "recursive" solution:

"$X_CATS {kot, koty, kotów}" // w/o rules - indexed for algorithm rule
"$X_WAS {był, były, było}" // w/o rules - indexed for algorithm rule
"$X_WAS_CAP {Był, Były, Było}" // w/o rules - indexed for algorithm rule - added FOR STYLE (this translation/lang only) as beginning of sentence (if no filters applied later)
"${0:There were no cats, 1:Only one cat was here, 1/n:There was <X> <RE>[$X_CATS]}: {0:Nie było tu żadnych kotów, 1:Był tu tylko jeden kot, 2:Były tu tylko dwa koty, n:<RE>[$X_WAS] tu <X> <RE>[$X_CATS]}"
"$X_CATS_WAS_HERE: {0:Nie było tu żadnych kotów, 1:Był tu tylko jeden kot, 2:Były tu tylko dwa koty, n:<RE>[$X_WAS_CAP] tu <X> <RE>[$X_CATS]}"

The first and second simply uses number_parameter => position_index algorithm which covers ALL possibilities (0,Inf).
The third is more complex but allows context and style (f.e. for dialogs):

0: Nie było tu żadnych kotów
1: Był tu tylko jeden kot
2: Były tu tylko dwa koty // this could be done by n: rule BUT intended to write number in words
3: Były tu 3 koty
4. Byly tu 4 koty
5. Było tu 5 kotów

  1. I think options should be separated by pipe ("|") because comma (",") can be used in sentences.
  2. Range definitions should use standarized (ISO 31-11) notation with inclusive / exclusive delimiters( [/] ).
  3. Possibility to pass more variables (see below).

$X_CATS { kot | koty | kotów } // trim whitespaces
$X_FEM_CATS { kotka | kotki | kotek } // trim whitespaces
$X_WAS_CAP {Był|Były|Było}
${0:There were no cats | 1:Only one cat was here | ]1,Inf[:There was <X> <RE>[$X_CATS], <Y> <RE>[$X_F_CATS,<Y>] among them }: {0:Nie było tu żadnych kotów | 1:Był tu tylko jeden kot | 2:Były tu tylko dwa koty | ]2,Inf[:<RE>[$X_WAS_CAP] tu <X> <RE>[$X_CATS], w tym <Y> <RE>[$X_F_CATS,<Y>] }
`$X_CATS_WAS_HERE: {0:Nie było tu żadnych kotów | 1:Był tu tylko jeden kot | 2:Były tu tylko dwa koty | ]2,Inf[:[$X_WAS_CAP] tu [$X_CATS,], w tym [$X_FEM_CATS,] }

  1. As probably noticed to_translate (all between $ and :) is only "indexing string" (for searching in arrays). ":" as ending isn't needed when "{}" are used.
  2. Ranges MAY DIFFER in source and target.
  3. [$KEY,<number_parameter>] - if additional parameter exists JUST use CHOICE method instead of GET

I hope you can see its universality now ;)

Similiar problem:
http://www.if-not-true-then-false.com/2010/php-1st-2nd-3rd-4th-5th-6th-php-add-ordinal-number-suffix/
It can be done similiar (array/index) way:
$X_ORD_SUFFIX {"st"|"nd"|"rd"|"th"} // plus algorithm rules
but rules may be more complex ;)
polish suffixes
$X_ORD_SUFFIX {"szy" | "gi" | "ci" | "ty" | "my" | "ny" | "wy"} // 7 options :D
https://github.com/arius86/number-formatter/blob/master/src/Lang/Pl/SpelloutOrdinalMasculine.php
Can you imagine range-rules inside translations? plus gender differences (masculine/feminine/neuter).
Implementing rules as number-index helpers are much more simpler, safe, universal, easy to extend.

@larsiusprime
Copy link
Owner

larsiusprime commented Aug 2, 2016

Okay, wow lots of stuff.

I think one thing that's clear from this is that for most games a best practice is to avoid dynamically generated sentences as much as possible, because that's when you suddenly need to account for all this crazy stuff. If you just have a sentence like, "You found a sword" -- that's static content, and the translator just translates it, with whatever grammar applies. "You found <X> <SOMETHINGS>" is when the trouble starts.

For instance, in the original version of Defender's Quest I had all these big fancy sentence descriptions for skills, stuff like,

"Super smash deals 235 damage to up to 6 targets within range of 3.5 tiles and poisons them."

Sooo much implicit grammar is baked into that. Now we just do something like this instead:

Skill: Super smash
Damage: 235
Max targets: 6
Range: 3.5 tiles
Effect: Poison

Not only is that waaaaay easier to translate, it's easier for the player to read and understand.

But nevertheless sometimes you have to deal with dynamically constructed content.

So the question is how to properly split the difference between having the program generate flags ala:

$YOU_FOUND_1_SWORD
$YOU_FOUND_2_SWORDS
$YOU_FOUND_X_SWORDS
$YOU_FOUND_14_FEMALE_SWORDS_FORMAL_CASE_SOCIAL_STATUS_PLUS_2

...and doing something like the crazy metadata solutions above. I do like the general idea to treat number as a special case and perhaps rely on permutations for other stuff. I'll have to think about this some more :)

For the metadata solution, what's important is that it be easily parsable with minimal ambiguity, straightforward logic, and preferably zero reliance on regular expressions -- if we can do it with just successive String.split() commands, that's ideal.

@axelmm
Copy link
Author

axelmm commented Aug 3, 2016

I thinked this lib was intended to be universal, not for games only ;)
We should take a look at different solutions just to not reinvent the wheel.

Looking at Symfony solution:

  • plural rules was adopted from Zend - was good enought for them and still is in use;
  • it's used within Twig templates;
    Of course can be used from code BUT notice that in both cases there are only two main methods: trans and transchoice.
    WHY? Because system used from templates MUST HAVE all data provided in ONE STEP/TOKEN to work.
    Looking at sources, both have REPLACE variables implemented INSIDE system (just before return) - just like redirect token. We can see internal GET method but it shouldn't be used directly. As general rule we should 'feed' the system with all data needed to return whole, ready to use/print string.

tongue.trans("$HELLO_WORLD",["context"=>"informal"]) // formality/social context - passing metadata
$HELLO_NAME Witaj, <name>!
tongue.trans("$HELLO_NAME", null, ["name"=>"John"]) // default context, passing variables

$CATS_PL kot|koty|kotów
tongue.transchoice("$CATS_PL",["context"=>"informal"], 5) // should return "kotów" as plural (Zend's) rule returns position index 2 for Polish (see try.haxe POC link above)

$CATS_FEMALE_PL kotka|kotki|kotek
$WAS_PL był|były|było // was
$WAS_PL_CAP Był|Były|Było // 'was' at the begining of sentence
$X_CATS_Y_F_CATS_WAS_HERE {0}Nie było tu żadnych kotów|{1}Był tu tylko jeden kot|]2,Inf[<RE>[$WAS_PL_CAP,0] tu <N_CATS> <RE>[$CATS_PL,0], w tym <N_F_CATS> <RE>[$CATS_FEMALE_PL,1]. Widziałeś je <name>? // 0 or 1 in replace rule is index to numerical parameters passed as array:
tongue.transchoice("$X_CATS_Y_F_CATS_WAS_HERE",null, [5,3], [ "name"=>"John", "N_CATS" => Std.string(num_cats), "N_F_CATS" => Std.string(num_fem_cats) ]) // more than 1 numerical parameters passed as array (first used as main choice selector, both used for 'redirected' replacing) + array of named variables for simple replacing

These rules aren't hard to implement and we can have very powerfull system, can be better than Symfony (redirected tokens, multiple numerical parameters). We can extend behaviour by metadata (second parameter array) for formal/informal/social status etc.

In the next step we can think about setting priority order of rediredted tokens for gender change affecting following redirected behaviour replacing or additional rules IN TRANSLATING SYSTEM - hidden from programmer ...
... add methods (based on other choice rules) for general or ordinal number spellouts... etc.

Translator only need to know how works plural selector (rules for his language whitch means needs to know how many options he has to provide) and redirected replacing - exact syntax for paramaters and variables he will get with original/source (English) file. He can even extend translation using SYSTEM LOGIC if it is needed - as above when I added $WAS_PL_CAP (in source could be only $WAS_PL was|were ) and it was intended to be the first word in sentence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants