Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accent marks in Russian files #1355

Open
javnik36 opened this issue Sep 3, 2024 · 0 comments
Open

Accent marks in Russian files #1355

javnik36 opened this issue Sep 3, 2024 · 0 comments

Comments

@javnik36
Copy link
Contributor

javnik36 commented Sep 3, 2024

Continuation of #1354

I ran the script that looks for combining diacritical marks from Combining Diacritical Marks Unicode Block in i18n files. My main purpose was to find out cases when single unicode character should be used instead of combination of 2 marks [letter+combining accent].

Example:

ę in Polish can be mistakenly written as [e+U+0328 combining ogonek]. Both cases look exactly the same > ę vs ę < but the latter is incorrect and may cause issues when editing/displaying.

Script found several accent marks in Russian files - I'm leaving output of the script for Russian natives to take a look and (maybe) do any action. According to my small research it looks like acute accent (https://www.compart.com/en/unicode/U+0301) may be used in Russian with combination with Cyrillic letters, but other combinations have their own glyphs, so they may be incorrect right now.

Char Unicode of accent found after Char String Found in File Line number
о U+0301 по́том ru: dwl/armitages_fate.po 29
о U+0301 во́роны ru: tcu/union_and_disillusion.po 19
а U+0301 замка́ ru: tfa/heart_of_the_elders_part_1.po 47
о U+0301 Про́кляты(...) ru: tfa/heart_of_the_elders_part_2.po 57
, U+0301 (...)удно,́ ru: tfa/the_city_of_archives.po 130
о U+0301 мо́чи ru: tfa/those_held_captive.po 100
о U+0301 по́том ru: tfa/those_held_captive.po 109
и U+0306 (...)льшой ru: tskc/10_dancing_mad.po 126
е U+0308 (...)удалённые ru: tskc/10_dancing_mad.po 201
е U+0308 её ru: tskc/10_dancing_mad.po 201
е U+0308 «стёрты ru: tskc/10_dancing_mad.po 207
е U+0308 N/A ru: tskc/10_dancing_mad.po 210
е U+0308 N/A ru: tskc/10_dancing_mad.po 213
й U+0306 Кай̆т-Бей ru: tskc/15_dogs_of_war.po 76
и U+0306 N/A tskc/27_congress_of_the_keys.po 613
и U+0306 (...)ертой tskc/27_congress_of_the_keys.po 613
и U+0306 N/A tskc/27_congress_of_the_keys.po 619
и U+0306 (...)ертой tskc/27_congress_of_the_keys.po 619
и U+0306 (...)ругой tskc/27_congress_of_the_keys.po 619
и U+0306 (...)ячейка tskc/28_epilogue.po 75
и U+0306 (...)Новый tskc/campaign.po 873
и U+0306 (...)хожий tskc/campaign.po 934
и U+0306 (...)остей rules\ru\rules.json N/A
и U+0306 (...)енной rules\ru\rules.json N/A
и U+0306 (...)ыграйте rules\ru\rules.json N/A
и U+0306 свойства rules\ru\rules.json N/A
и U+0306 (...)аждой rules\ru\rules.json N/A
и U+0306 (...)остей rules\ru\rules.json N/A
ы U+0301 (...)менны́х rules\ru\rules.json N/A

^ string may be truncated when (...) is written

Interpretation of the table:

и | U+0306 | (...)льшой | ru: tskc/10_dancing_mad.po | 126

In msgid number 126 of tskc/10_dancing_mad.po Russian .po file there is string like (...)льшой where и character is combined with U+0306 accent. This may be incorrect, because й character exists as separate unicode glyph OR U+0301 accent supposed to be used instead of U+0306.

Reference (all unicode accent from the table):

Disclaimer: I may be completely wrong, hence treat this issue as information only :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant