-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix typos, plural logic proposal #23
base: dev
Are you sure you want to change the base?
Conversation
what with incomplete translations - fallback? permutations don't scale well option for 'simple' rules for plurals? IT'S VERY IMPORTANT and below is 'simple complex' example hard to implement this way <PL> - special 'marker'/tag for PLural, if last on string, can be left not closed </PL> // simple renges $FEW_MANY_PLURAL, one<PL,0>no<PL,2>two<PL,3>a few<PL,10>many</PL> // default=1; range 3-9 a few; more than 9 - many $CAT, cat<PL,0,2>cats </PL> // default, no cats, >2 cats // polish - more complex but ... quite simple ? $CAT, kot<PL,0,5+>kotów<PL,2,22,23,24,?2,?3,?4>koty</PL> // 2,3,4 koty; 5,6,7..11,12...21 kotów; 22,23,24.. koty; 25..101 kotów; 103 koty; [111,212... kotów] would be broken in this case; 222 koty; // each number (>=2) OPENS RANGE (2-4) but followed by "+" (5+) defines DEFAULT FOR ALL GREATER, following rules with greater value are ONLY SINGLE VALUE axceptions (next "+"rule only changes greater_default) - can be harder to implement (2 pases?) // for start (if it looks to complex) it can be w/o ranges, only defaults and exceptions (order matters) // these rules would be then: $CAT, kot<PL,0,5+,?11,?12,?13,?14>kotów<PL,2,3,4,?2,?3,?4>koty</PL> // ? means optional, any or no number $CAT, kot<PL,0,5+,?11,?12,?13,?14>kotów<PL,?2,?3,?4>koty</PL> // shorter, because ? matches 2,3,4, 22,23,24 except 11,12,..111,112 - order matters -- rules preinterpreted and stored as 'firewall chain'/array - first match return? f.e. 6 as falling in range_default rule gets value but not returns, trying next rules, if none matches returns this default; 12 matches 5+ range but flows into next rules .. ?12 SINGLE exception whitch returns immediately (not trying to apply next matching ?2 ) usage with OPTIONAL parameter for get() count = 12; plur_string = ft.get("$FEW_MANY_PLURAL",$count); cat_string = ft.get("$CAT",$count); // then use them with replace
Hi, quick POC in haxe: http://try.haxe.org/#1E141 |
Hey guys, thanks for these contributions, sorry I never responded to the original comment. When I have some time I'll see if I can reconcile/merge everything. |
@axelmm Thanks for the links, I'll check them out! However, there might potentially be languages where several parts of the sentence change depending on the number and not just the word after the value; I don't really know. That's not the case in any of the languages I'm familiar with and I don't think I've seen any features catering to that in other loca libraries, but you know...there's a lot of languages out there. |
Okay, let's start with a simple language where plurals depend on two variables as a general case. Spanish -- numerical adjectives have to agree with the nouns they modify in both NUMBER and GENDER. So "gato" is "male cat" and "gata" is "female cat." English:
Spanish:
So we could take your bracket-syntax and think of it like this: The
In English it would simply be: For some extremely declined languages with billions of cases and special exceptions you could thus encode highly complex rules. For instance, here's a fictional example where the declention of the simple word "cat" depends not only on the number of cats, but also the gender of the cats, the formality of the speech, and the relative difference in social status between the speakers. And for fun's sake let's say that these interact in highly complex ways that the entire way the word is spelled could be entirely dependent on a specific permutation of all those variables.
So this syntax does seem to give us some way of handling this kind of exploding complexity and offload it to the translator. The tricky bit is that both the programmer and the translator would have to be aware of the relevant metadata and set it up somehow. So, purposefully not using any existing function in the firetongue API, it'd be something like:
I think this would cover just about all the possible bases one could have and make it generically extensible, but it's probably overkill for most usage. That said, to hit Spanish you'd at least need something like this:
And in English you'd only need this:
The more complex insane case collapses transparently in the other languages that don't need as many different variables to consider, and the translator can still provide the full range of possibilities without having to add tons of different lines. However, we're starting to engineer a private programming language here in the In any case, what do you think? Am I barking up the wrong tree? |
Hi, Concerning this part you mention here (However, there might potentially be languages where several parts of the sentence change depending on the number and not just the word after the value), yes, you are right, you have to adapt the genre and the number not only for the adjectives or the nouns, but also for other words like articles and even verbs. Most common problems I find: You found a red shield When it comes to translation, a very simple string like this one may cause many problems we have to face. Let’s see: If the game takes these splitted parts of the same sentence and joins them in a single sentence, it will work nicely in English. Taking into account that we have translated it in Spanish, it will become this mess: String_1_action: Has encontrado un Has encontrado un rojo escudo (WRONG) -First problem: the word “a” is integrated in the verb string. If I can translate it only once (since String_1_action would be a fixed string), I cannot set the genre of the item that would follow. The article would be correct for the shield line but not for the sword line. -Second problem: word order. The adjective must go behind the name in this case (most of the time it will be like that, needing the adjective after the noun/item. In Spanish, it could go before for stylistic purposes in certain texts; in this case, having the adjective before is a grammatical error). Third problem: Genre of adjective. The same example as in first problem, I cannot set the gender of the word red depending on the item, so I am stuck with a fixed gendered adjective which will only work for the first case. Corrected sentences would be: Has encontrado un escudo rojo (Word order changed) We can add more variations to these examples just changing the gender and the number: Another issues in verbs String 1: The merchant (insert_verb_for_buy) the helmet In both cases, English can display the same verb: The merchant purchased* the helmet BUT, Spanish verb will be different depending on the pronoun So, it is clear that I cannot have a fixed single string when it comes to translate “purchased”. More examples El mercader *(ha vendido) el yelmo Hope it helps a bit! |
@larsiusprime That's a very powerful system that would probably be too much for the average translator (and annoy programmers). The thing is, I'm not really convinced we actually need to encode all this data. Metadata like the gender of the thing being discussed and the social relation between the speakers is usually known and constant for that particular string. I mean, I guess you could create a generic However, @FelipeMercader brought up a good point. If several parts of the string are variables, things become ugly rather quick. If you have So I guess the question is: Would all the effort of implementing such a system even be worth it? I'm not sure. |
This domain is much complex than you think ;) Working with this I learnt many things. Gender
As @FelipeMercader wrote PROBLEM IS when adjective gender or article gender changes. CONTEXT (and style) is a problem
Symfony manages this with template syntax: In ORO translations are EXTRACTED from templates using CLI - creating files with "TAGS"
I woud like to see there a mixed / "recursive" solution:
The first and second simply uses number_parameter => position_index algorithm which covers ALL possibilities (0,Inf).
I hope you can see its universality now ;) Similiar problem: |
Okay, wow lots of stuff. I think one thing that's clear from this is that for most games a best practice is to avoid dynamically generated sentences as much as possible, because that's when you suddenly need to account for all this crazy stuff. If you just have a sentence like, "You found a sword" -- that's static content, and the translator just translates it, with whatever grammar applies. For instance, in the original version of Defender's Quest I had all these big fancy sentence descriptions for skills, stuff like,
Sooo much implicit grammar is baked into that. Now we just do something like this instead:
Not only is that waaaaay easier to translate, it's easier for the player to read and understand. But nevertheless sometimes you have to deal with dynamically constructed content. So the question is how to properly split the difference between having the program generate flags ala:
...and doing something like the crazy metadata solutions above. I do like the general idea to treat number as a special case and perhaps rely on permutations for other stuff. I'll have to think about this some more :) For the metadata solution, what's important is that it be easily parsable with minimal ambiguity, straightforward logic, and preferably zero reliance on regular expressions -- if we can do it with just successive String.split() commands, that's ideal. |
I thinked this lib was intended to be universal, not for games only ;) Looking at Symfony solution:
These rules aren't hard to implement and we can have very powerfull system, can be better than Symfony (redirected tokens, multiple numerical parameters). We can extend behaviour by metadata (second parameter array) for formal/informal/social status etc. In the next step we can think about setting priority order of rediredted tokens for gender change affecting following redirected behaviour replacing or additional rules IN TRANSLATING SYSTEM - hidden from programmer ... Translator only need to know how works plural selector (rules for his language whitch means needs to know how many options he has to provide) and redirected replacing - exact syntax for paramaters and variables he will get with original/source (English) file. He can even extend translation using SYSTEM LOGIC if it is needed - as above when I added $WAS_PL_CAP (in source could be only |
nice job,
what with incomplete translations - fallback?
permutations don't scale well
option for 'simple' rules for plurals? IT'S VERY IMPORTANT and below is 'simple complex' example hard to implement this way
<PL> - special 'marker'/tag for PLural, if last on string, can be left not closed </PL>
// simple ranges
$FEW_MANY_PLURAL, one<PL,0>no<PL,2>two<PL,3>a few<PL,10>many</PL> // default=1; range 3-9 a few; more than 9 - many
$CAT, cat<PL,0,2>cats </PL> // default, no cats, >2 cats
// polish - more complex but ... quite simple ?
$CAT, kot<PL,0,5+>kotów<PL,2,22,23,24,?2,?3,?4>koty</PL>
// 2,3,4 koty; 5,6,7..11,12...21 kotów; 22,23,24.. koty; 25..101 kotów; 103 koty; [111,212... kotów] would be broken in this case; 222 koty;
// each number (>=2) OPENS RANGE (2-4) but followed by "+" (5+) defines DEFAULT FOR ALL GREATER, following rules with greater value are ONLY SINGLE VALUE axceptions (next "+"rule only changes greater_default) - can be harder to implement (2 pases?)
// for start (if it looks to complex) it can be w/o ranges, only defaults and exceptions (order matters)
// these rules would be then:
$CAT, kot<PL,0,5+,?11,?12,?13,?14>kotów<PL,2,3,4,?2,?3,?4>koty</PL>
// ? means optional, any or no number$CAT, kot<PL,0,5+,?11,?12,?13,?14>kotów<PL,?2,?3,?4>koty</PL>
// shorter, because ? matches 2,3,4, 22,23,24 except 11,12,..111,112 - order matters -- rules preinterpreted and stored as 'firewall chain'/array - first match return? f.e. 6 as falling in range_default rule gets value but not returns, trying next rules, if none matches returns this default; 12 matches 5+ range but flows into next rules .. ?12 SINGLE exception whitch returns immediately (not trying to apply next matching ?2 )usage with OPTIONAL parameter for get()
count = 12; plur_string = ft.get("$FEW_MANY_PLURAL",$count); cat_string = ft.get("$CAT",$count);
// then use them with replace