MD_DecodeCodepointFromUtf16 incorrectly calculates codepoints greater than 0xFFFF because it does not offset by 0x10000.
Adding 0x10000 to the end of the codepoint calculation should fix the issue:
if (1 < max && 0xD800 <= out[0] && out[0] < 0xDC00 && 0xDC00 <= out[1] && out[1] < 0xE000)
{
result.codepoint = ((out[0] - 0xD800) << 10) | (out[1] - 0xDC00) + 0x10000;
result.advance = 2;
}
Reference: Step 5 for Decoding UTF-16
MD_Utf8FromCodepoint sets the first byte incorrectly when the codepoint requires four bytes because it left-bitshifts MD_bitmask4 by 3 rather than 4.
MD_bitmask4 is the value 0x0F (in binary 1111), and the first byte in UTF-8 of codepoints greater than 0xFFFF should start with the binary 11110 (which would then get bitshifted by 3 so the remaining 3 bits can hold codepoint info).
Bitshifting by 4 instead of 3 should fix the issue:
else if (codepoint <= 0x10FFFF)
{
out[0] = (MD_bitmask4 << 4) | ((codepoint >> 18) & MD_bitmask3);
out[1] = MD_bit8 | ((codepoint >> 12) & MD_bitmask6);
out[2] = MD_bit8 | ((codepoint >> 6) & MD_bitmask6);
out[3] = MD_bit8 | ( codepoint & MD_bitmask6);
advance = 4;
}
MD_DecodeCodepointFromUtf16incorrectly calculates codepoints greater than 0xFFFF because it does not offset by 0x10000.Adding 0x10000 to the end of the codepoint calculation should fix the issue:
Reference: Step 5 for Decoding UTF-16
MD_Utf8FromCodepointsets the first byte incorrectly when the codepoint requires four bytes because it left-bitshiftsMD_bitmask4by 3 rather than 4.MD_bitmask4is the value 0x0F (in binary 1111), and the first byte in UTF-8 of codepoints greater than 0xFFFF should start with the binary 11110 (which would then get bitshifted by 3 so the remaining 3 bits can hold codepoint info).Bitshifting by 4 instead of 3 should fix the issue: