Latin small letter I without dot is treated incorrectly for UNICODE_CI #7477
Replies: 2 comments 2 replies
-
UNICODE_CI is a generic collation, and as far as I'm aware, it should not apply locale-specific case-insensitivity rules like this one. As an aside, you created this as a discussion, and not as an issue. If you really want to discuss/ask a question, it might be better to ask on firebird-support or firebird-devel instead of here. |
Beta Was this translation helpful? Give feedback.
2 replies
-
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Latin small letter I without dot (U+0131) upper case is latin capital letter I without dot in Turkish. This is probably same in Azerbaijani, Crimean Tatar, Gagauz, Kazakh, Tatar languages. https://en.wikipedia.org/wiki/Dotless_I
There is also this document https://unicode.org/charts/PDF/U0000.pdf

Which is Latin ASCII table document. It has below part in it
I do not know if that small case -> upper case conversion break another language. I actually do not know where to check these conversions as an authority. But above two locations indicate that such conversion is the right one.
If you run below script on v4.0.2, it will successfully insert all statements where I expect it to fail on the last one as it is already inserted as small caps dotless I in the secod insert statement.
If that is fine, I would appreciate a change in UNICODE_CI to follow this conversion and fail on the last statement in example script.
Beta Was this translation helpful? Give feedback.
All reactions