Skip to content

locale.c: Use new utf8_to_bytes_temp_pv() #22811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 9 additions & 15 deletions locale.c
Original file line number Diff line number Diff line change
Expand Up @@ -8619,15 +8619,11 @@ S_strftime8(pTHX_ const char * fmt,
if (! is_locale_utf8(locale)) {
locale_utf8ness = LOCALE_NOT_UTF8;

bool is_utf8 = true;
Size_t fmt_len = strlen(fmt);
fmt = (char *) bytes_from_utf8((U8 *) fmt, &fmt_len, &is_utf8);
if (is_utf8) {
if (! utf8_to_bytes_temp_pv((U8 **) &fmt, &fmt_len)) {
SET_EINVAL;
return false;
}

SAVEFREEPV(fmt);
}
else {
locale_utf8ness = LOCALE_IS_UTF8;
Expand Down Expand Up @@ -9839,14 +9835,12 @@ Perl_mem_collxfrm_(pTHX_ const char *input_string,
}
else { /* locale is not UTF-8; but input is; downgrade the input */

s = (char *) bytes_from_utf8((const U8 *) s, &len, &utf8);

/* If the downgrade was successful we are done, but if the input
* contains things that require UTF-8 to represent, have to do
* damage control ... */
if (UNLIKELY(utf8)) {

/* What we do is construct a non-UTF-8 string with
if (UNLIKELY(! utf8_to_bytes_temp_pv((const U8 **) &s, &len))) {
/* If the downgrade was successful we are done, but if the
* input contains things that require UTF-8 to represent, have
* to do damage control:
*
* What we do is construct a non-UTF-8 string with
Comment on lines -9842 to +9843
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see what originally freed the possible new buffer returned by bytes_from_utf8() here, this isn't needed for the replacement, but I don't see what would have been freeing that in the original.

I'll spend some more time looking at it, but it might be faster if you can point out what I missed :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while:

$ PERL_DESTRUCT_LEVEL=2 gdb --args ./perl -Ilib -Mlocale -MPOSIX=setlocale,LC_ALL  -le 'setlocale(LC_ALL, "en_AU.ISO8859-1") or die; $x = "\x{100}", $y = "A\xBD"; utf8::upgrade($y); print $x cmp $y'
...
(gdb) r
Starting program: /home/tony/dev/perl/git/perl6/perl -Ilib -Mlocale -MPOSIX=setlocale,LC_ALL -le setlocale\(LC_ALL,\ \"en_AU.ISO8859-1\"\)\ or\ die\;\ \$x\ =\ \"\\x\{100\}\",\ \$y\ =\ \"A\\xBD\"\;\ utf8::upgrade\(\$y\)\;\ print\ \$x\ cmp\ \$y
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
1
free(): double free detected in tcache 2

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, 
    no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, 
    signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007ffff7d2df1f in __pthread_kill_internal (signo=6, 
    threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007ffff7cdefb2 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/posix/raise.c:26
#3  0x00007ffff7cc9472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x00007ffff7d22430 in __libc_message (action=action@entry=do_abort, 
    fmt=fmt@entry=0x7ffff7e3c459 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#5  0x00007ffff7d3783a in malloc_printerr (
    str=str@entry=0x7ffff7e3f098 "free(): double free detected in tcache 2")
    at ./malloc/malloc.c:5660
#6  0x00007ffff7d39ac6 in _int_free (av=0x7ffff7e75c60 <main_arena>, 
    p=0x555555b142f0, have_lock=have_lock@entry=0) at ./malloc/malloc.c:4469
#7  0x00007ffff7d3bf1f in __GI___libc_free (mem=mem@entry=0x555555b14300)
    at ./malloc/malloc.c:3385
#8  0x00005555557d3cfc in Perl_safesysfree (where=where@entry=0x555555b14300)
    at util.c:433
#9  0x0000555555762450 in Perl_leave_scope (base=0) at scope.c:1391
#10 0x000055555568ba85 in Perl_pp_leave () at pp_ctl.c:2423
#11 0x00005555555edc10 in Perl_runops_debug () at dump.c:2993
#12 0x00005555555d173d in S_run_body (oldscope=1) at perl.c:2883
#13 perl_run (my_perl=<optimized out>) at perl.c:2798
#14 0x000055555559c0ed in main (argc=<optimized out>, argv=<optimized out>, 
    env=<optimized out>) at perlmain.c:127
(gdb) q
$ PERL_DESTRUCT_LEVEL=2 valgrind -q ./perl -Ilib -Mlocale -MPOSIX=setlocale,LC_ALL  -le 'setlocale(LC_ALL, "en_AU.ISO8859-1") or die; $x = "\x{100}", $y = "A\xBD"; utf8::upgrade($y); print $x cmp $y'
1
==2081366== Invalid free() / delete / delete[] / realloc()
==2081366==    at 0x484317B: free (vg_replace_malloc.c:872)
==2081366==    by 0x387CFB: Perl_safesysfree (util.c:433)
==2081366==    by 0x31644F: Perl_leave_scope (scope.c:1391)
==2081366==    by 0x23FA84: Perl_pp_leave (pp_ctl.c:2423)
==2081366==    by 0x1A1C0F: Perl_runops_debug (dump.c:2993)
==2081366==    by 0x18573C: S_run_body (perl.c:2883)
==2081366==    by 0x18573C: perl_run (perl.c:2798)
==2081366==    by 0x1500EC: main (perlmain.c:127)
==2081366==  Address 0x12867490 is 0 bytes inside a block of size 3 free'd
==2081366==    at 0x484317B: free (vg_replace_malloc.c:872)
==2081366==    by 0x387CFB: Perl_safesysfree (util.c:433)
==2081366==    by 0x1D0233: Perl_mem_collxfrm_ (locale.c:10158)
==2081366==    by 0x33C4E9: Perl_sv_collxfrm_flags (sv.c:8646)
==2081366==    by 0x33C665: Perl_sv_cmp_locale_flags (sv.c:8561)
==2081366==    by 0x20F53A: Perl_pp_scmp (pp.c:2489)
==2081366==    by 0x1A1C0F: Perl_runops_debug (dump.c:2993)
==2081366==    by 0x18573C: S_run_body (perl.c:2883)
==2081366==    by 0x18573C: perl_run (perl.c:2798)
==2081366==    by 0x1500EC: main (perlmain.c:127)
==2081366==  Block was alloc'd at
==2081366==    at 0x48407B4: malloc (vg_replace_malloc.c:381)
==2081366==    by 0x3899B8: Perl_safesysmalloc (util.c:176)
==2081366==    by 0x38402D: Perl_utf8_to_bytes_ (utf8.c:2857)
==2081366==    by 0x1CF681: Perl_mem_collxfrm_ (locale.c:9838)
==2081366==    by 0x33C4E9: Perl_sv_collxfrm_flags (sv.c:8646)
==2081366==    by 0x33C665: Perl_sv_cmp_locale_flags (sv.c:8561)
==2081366==    by 0x20F53A: Perl_pp_scmp (pp.c:2489)
==2081366==    by 0x1A1C0F: Perl_runops_debug (dump.c:2993)
==2081366==    by 0x18573C: S_run_body (perl.c:2883)
==2081366==    by 0x18573C: perl_run (perl.c:2798)
==2081366==    by 0x1500EC: main (perlmain.c:127)
==2081366== 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this isn't covered by tests, considering none failed.

* 1) the characters representable by a single byte converted
* to be so (if necessary);
* 2) and the rest converted to collate the same as the
Expand All @@ -9866,8 +9860,6 @@ Perl_mem_collxfrm_(pTHX_ const char *input_string,
* sort entirely equal, then the sort order for the
* above-255 code points will be in code point order. */

utf8 = FALSE;

/* If we haven't calculated the code point with the maximum
* collating order for this locale, do so now */
if (! PL_strxfrm_max_cp) {
Expand Down Expand Up @@ -9957,6 +9949,8 @@ Perl_mem_collxfrm_(pTHX_ const char *input_string,
Renew(s, d, char); /* Free up unused space */
}
}

utf8 = false;
}

/* Here, we have constructed a modified version of the input. It could
Expand Down