TextSubtext() issue #3792

DesLandysh · 2024-02-10T13:17:16Z

DesLandysh
Feb 10, 2024

I'm testing the text functions in python wrapper of raylib, and found this issue I couldn't resolve:

I have a string that contains ASCII and Cyrillic letters, I even ensure that it's 'utf-8' encoding.
I use drawTextEx() function with textSubtext() function with the parameters from tutorial.
And it goes fine until it meets the Cyrillic glyph.

I tried different fonts
I even ensure that codepoints contains need glyphs and print it's copy using loadUTF-8() function, so Pycharm Terminal sees these utf-8 glyphs, but textSubtext() is not.

I guess, it's wide characters issue in C (and locale), that's why it also screened what returns and error (ffi goes to None with remark that it's not true address of __cffi...)

(OS: Windows10CORP 22H2, Pycharm2023.2.3, Python 3.12)

Have anyone thoughts how to solve this?

Answered by raysan5

Feb 10, 2024

@DesLandysh I'm afraid TextSubtext() does not consider UTF-8 strings but a plain byte array of data, so, Cyrillic characters are actually processed byte by byte, getting the equivalent character for every individual byte (or just crashing in the process)...

The solution would be getting the subtext character by character, considering that on UTF-8 some characters could imply more than one byte. raylib provides the following function for that:

int GetCodepointNext(const char *text, int *codepointSize);  // Get next codepoint in a UTF-8 encoded string

Here some code sample to move along a text buffer one character at a time, considering Unicode codepoints codified as UTF-8:

int currentCodep…

View full answer

orcmid · 2024-02-10T15:59:39Z

orcmid
Feb 10, 2024

I can't tell from your code. However, it is important to know that codepoints are not UTF-8, they are essentially UTF-32. Put another way, codepoints are exactly the way glyphs are assigned in the Unicode specification (e.g., л has codepoint 0x043B). It happens, of course, that the first 127 of them correspond to the single-byte UTF-8 encodings and also the ISO-646 (Internationalized ASCII) codes.

This is different than the wide-character business on a platform such as Windows. Those are typically (historically?) UTF-16 when used for Unicode. Part of the difficulty here is that wchar_t is not the same on all platforms and libraries. It's possible for wchar_t variables to hold full-Unicode code points. This gets murky when creating portable code for multiple-platforms.

0 replies

raysan5 · 2024-02-10T17:36:49Z

raysan5
Feb 10, 2024
Maintainer

@DesLandysh I'm afraid TextSubtext() does not consider UTF-8 strings but a plain byte array of data, so, Cyrillic characters are actually processed byte by byte, getting the equivalent character for every individual byte (or just crashing in the process)...

The solution would be getting the subtext character by character, considering that on UTF-8 some characters could imply more than one byte. raylib provides the following function for that:

int GetCodepointNext(const char *text, int *codepointSize);  // Get next codepoint in a UTF-8 encoded string

Here some code sample to move along a text buffer one character at a time, considering Unicode codepoints codified as UTF-8:

int currentCodepoint = -1;
int nextCharacterIndex = 0;

while (currentCodepoint != 0)   // Check for '\0', usually EOL character
{
    int codepointSize = 0;
    int currentCodepoint = GetCodepointNext(my_text + nextCharacterIndex, &codepointSize);
    nextCharacterIndex += codepointSize;  // It could be 1 byte or more
}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TextSubtext() issue #3792

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

TextSubtext() issue #3792

DesLandysh Feb 10, 2024

Replies: 2 comments

orcmid Feb 10, 2024

raysan5 Feb 10, 2024 Maintainer

DesLandysh
Feb 10, 2024

orcmid
Feb 10, 2024

raysan5
Feb 10, 2024
Maintainer