Skip to content

Bad SHIFT-JIS to UTF-8 conversions #14

@norikawa

Description

@norikawa

Some song names (and possibly other details such as artist names, though I've mostly only seen song names for this) are incorrect when converted to UTF-8 from SHIFT-JIS. For example, "Entropic EnĤαncemEnt" is instead converted to "Entropic En鹹αncemEnt"

For my own WIP server in Crystal, I've gone through and found most of the incorrect characters and made a list of them to automatically replace them after the conversion step:

def fix_bad_utf8(text : String)
		replacements = {
			"齷" => "é",
			"鬮" => "¡",
			"齶" => "♡", 
			"ケロH" => "ケロ⑨",
			"曦" => "à",
			"曩" => "è",
			"龕" => "€",
			"壬" => "ê",
			"驩" => "Ø",
			"=墸Σ≡=。゚:*.:+。.☆" => "=͟͟͞ Σ≡=。゚:*.:+。.☆",
			"鹹" => "Ĥ",
			"闃" => "Ā",
			"饌" => "²",
			"煢" => "ø",
			"餮" => "Ƶ",
			"蔕" => "ῦ",
			"盥" => "⚙︎",
			"頽" => "ä",
			"隍" => "Ü",
			"雋" => "Ǜ",
			"鬻" => "♃",
			"鬥" => "Ã",
			"鬆" => "Ý",
			"趁" => "Ǣ",
			"驫" => "ā",
			"騫" => "á",
			"齲" => "♥",
			"骭" => "ü"
		}
		replacements.each_key do |key|
			text = text.gsub(key) {replacements[key]}
		end
		return text
end

I'm not familiar enough with TypeScript to know what the equivalent functions would be, but hopefully the list of replacements can be helpful at least!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions