Database collation concerns from Spookerton #3646
MistakeNot4892
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Copied from Spook's ticket: #3018
Meta, I guess.
Database collation is not exactly a modern performance concern at ss13 scale. Because nebula aims to serve communities across languages and character sets it might still be a discussion worth having: collation is important for symbol comparison.
In #1318 back at the start of 2021 I mentioned utf8mb4_unicode_520_ci as a more up to date collation for text-centric tables, which are the most commonly used for ss13ish purposes. The PR's base utf8mb4 was still an upgrade over latin_swedish, and good. In the last couple of years things have moved on - but sadly not for the common better.
Mysql is still iterating on 8.0 and still supports the same utf8mb4_0900_ai_ci. Mariadb has recently (mid 2022) released 10.10 (the new lts, last month) 10.11. They also skipped matching mysql after all that time by releasing a unicode 14 collation set with 10.10 onward, preferring uca1400_ai_ci.
On premise: the later the unicode version referenced in the collation, in theory the more natural the sorting is and the better the grouping of those symbols is. To add - derived from mysql naming, the "ai/s" and "ci/s" suffixes stand for accent and case insensitivity/sensitivity.
As an example, in cases like "ted" and Téd", you probably want them to show up together in a search for either. Base utf8mb4 is completely unaware of both similarities, and that's just within extended latin.
So!
utf8mb4_unicode_520_ci is sadly still the last common best text collation. Given the choice of shipping least-worst, or duplicate-best, least-worst probably comes out on top for convenience? So, utf8mb4_unicode_520_ci is a good upgrade path for the collation indicated by the sql files the repo ships in sql/*.
Beta Was this translation helpful? Give feedback.
All reactions