Skip to content

CJK sorting is based on unicode code points #259

@quachpas

Description

@quachpas

When the CSL requires author-date sorting, e.g., gb-7714-2015-author-date, then characters need to be romanized before sorting, otherwise the default is sorting by code points.

let p8 = Person::from_strings(vec!["王", "一"]).unwrap();
let p8r = Person::from_strings(vec!["wang", "yi"]).unwrap();
let p9 = Person::from_strings(vec!["王", "二"]).unwrap();
let p9r = Person::from_strings(vec!["wang", "er"]).unwrap();

// 一 < 二
// yī > èr
// U+4E00 < U+4E8C
assert_eq!(Ordering::Less, p8.csl_cmp(&p9, LongShortForm::Long, false));
assert_eq!(Ordering::Greater, p8r.csl_cmp(&p9r, LongShortForm::Long, false));

// 大 < 安 < 晨 < 白
// dà < ān < Zhāng < bǎi
// U+5927 < U+5B89 < U+6668 < U+767D
let p1 = Person::from_strings(vec!["大", "大"]).unwrap();
let p2 = Person::from_strings(vec!["安", "安"]).unwrap();
let p3 = Person::from_strings(vec!["晨", "晨"]).unwrap();
let p4 = Person::from_strings(vec!["白", "白"]).unwrap();
assert_eq!(Ordering::Less, p1.csl_cmp(&p2, LongShortForm::Long, false));
assert_eq!(Ordering::Less, p2.csl_cmp(&p3, LongShortForm::Long, false));
assert_eq!(Ordering::Less, p3.csl_cmp(&p4, LongShortForm::Long, false));

image

Discord thread

EDIT: Probably identical issue could occur for non-latin script languages

Metadata

Metadata

Assignees

No one assigned

    Labels

    i18nIssues related to non-English languages

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions