Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix leak, broken collation of UTF-8 string in non-UTF8 locale #22963

Open
wants to merge 10 commits into
base: blead
Choose a base branch
from

Conversation

khwilliamson
Copy link
Contributor

This supercedes #22811

It started out just trying to convert to using the new forms for converting between bytes and UTF-8, but more problems were found.

  • This set of changes does not require a perldelta entry.

We no longer need to have declarations first thing in a block.

This allows removing a block and outdenting.

And it outdents further in preparation for a later commit

Also adds/clarifies some comments
The minimal space potentially freed up by this isn't worth the extra
libc call needed to accomplish it.
This function has a successful return branch and an unsuccessful return
branch, and a macro common to both, named CLEANUP_STRXFRM.  There is
another code clause repeated in both branches.  This reorders some of the
statements in each branch so that that clause is positioned to just
after the calls to CLEANUP_STRXFRM.  The next commit will add it to the
macro, so as to remove the redundant code clause.
This refactors a few clauses so that the case of having a UTF-8 string
in a UTF-8 locale can potentially be handled as a distinct situation.
This prepares for a potential future change to deal with above-Unicode
code points, which strxfrm() vary well may not deal with properly.
This uses the new function forms for doing this translation that can
avoid unnecessary mallocs
This commit creates a new variable, sans_highs, to store newly allocated
memory that only happens sometimes.  Just before returning, that is
freed.

This is the final step in solving the leak spotted by Tony Cook in
https://github.com/Perl/perl5/pulls?q=sort:updated-desc%20is:pr%20is:closed#discussion_r1868633416

The problem was that there were potentially 0 to 3 mallocs in this
function, and it was only freeing up to two of them.  The solution is to
have a separate variable for each malloc, and to free them all before
returning.  If the corresponding malloc did not happen, the variable
will be NULL, and no free will occur.

This makes a loop rather more complicated.  The next commit will
simplify it.
This loop has gotten rather complicated.  This refactors it to simplify
it.  It uses the typical paradigm used elsewhere in the code of having a
while loop with a source pointer, and a dest pointer, and copying and
incrementing them independently as needed as we go along.
We were creating a string but failed to set its length; hence the one
from the old string was used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant