Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heatshrink with -w 15 and HEATSHRINK_USE_INDEX=1 loops infinitely for >= 32768 byte file #55

Open
ecm-pushbx opened this issue Feb 22, 2020 · 5 comments
Labels
Milestone

Comments

@ecm-pushbx
Copy link

This is with revision v0.4.1-1-g7d419e1.

$ make clean
[...]
$ make OPTIMIZE="-O0"
[...]
$ ii=32768; while ((ii--)); do printf "\x90"; done > 90.bin
$ ./heatshrink -w 15 90.bin tmp
^C

Using -w 14 or lower, or building with HEATSHRINK_USE_INDEX set to zero, works around this bug.

It seems to be that in the function find_longest_match the while loop loops forever, because pos = 0, match_maxlen = 0, pospoint[match_maxlen] = pospoint[0] = 0, needlepoint[match_maxlen] = needlepoint[0] = 0x90 (presumably a byte (first one?) from the input file), hsi->index[pos] = hsi->index[0] = 0. So the first if in that while always is evaluated as true, pos gets reset to hsi->index[pos] (both are zero), and the loop continues looping.

Here's a gdb session showing the behaviour:

$ gdb --args ./heatshrink -w 15 90.bin tmp
GNU gdb (Debian 8.3.1-1) 8.3.1
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./heatshrink...
(gdb) break heatshrink_encoder.c:466
Breakpoint 1 at 0x2dee: file heatshrink_encoder.c, line 466.
(gdb) run
Starting program: /media/ssd-data/coding/proj/heatshrink/broken/heatshrink -w 15 90.bin tmp

Breakpoint 1, find_longest_match (hse=0x55555557c300, start=0, end=32768, 
    maxlen=16, match_length=0x7fffffffcbdc) at heatshrink_encoder.c:466
466	    while (pos - (int16_t)start >= 0) {
(gdb) step
467	        uint8_t * const pospoint = &buf[pos];
(gdb) 
468	        len = 0;
(gdb) 
473	        if (pospoint[match_maxlen] != needlepoint[match_maxlen]) {
(gdb) print/x pospoint[match_maxlen]
$1 = 0x0
(gdb) print/x needlepoint[match_maxlen]
$2 = 0x90
(gdb) step
474	            pos = hsi->index[pos];
(gdb) 
475	            continue;
(gdb) print/x pos
$3 = 0x0
(gdb) display/x pos
1: /x pos = 0x0
(gdb) step
466	    while (pos - (int16_t)start >= 0) {
1: /x pos = 0x0
(gdb) 
467	        uint8_t * const pospoint = &buf[pos];
1: /x pos = 0x0
(gdb) 
468	        len = 0;
1: /x pos = 0x0
(gdb) 
473	        if (pospoint[match_maxlen] != needlepoint[match_maxlen]) {
1: /x pos = 0x0
(gdb) print/x pospoint[match_maxlen]
$4 = 0x0
(gdb) print/x needlepoint[match_maxlen]
$5 = 0x90
(gdb) step
474	            pos = hsi->index[pos];
1: /x pos = 0x0
(gdb) display/x hsi->index[pos]
2: /x hsi->index[pos] = 0x0
(gdb) step
475	            continue;
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
466	    while (pos - (int16_t)start >= 0) {
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
467	        uint8_t * const pospoint = &buf[pos];
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
468	        len = 0;
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
473	        if (pospoint[match_maxlen] != needlepoint[match_maxlen]) {
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
474	            pos = hsi->index[pos];
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
475	            continue;
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
466	    while (pos - (int16_t)start >= 0) {
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
467	        uint8_t * const pospoint = &buf[pos];
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
468	        len = 0;
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
473	        if (pospoint[match_maxlen] != needlepoint[match_maxlen]) {
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) display/x match_maxlen
3: /x match_maxlen = 0x0
(gdb) 
@ecm-pushbx
Copy link
Author

This patch seems to fix it, but I'm not sure whether it is correct. It just fixes the infinite looping by checking that it won't loop with the same pos value.

$ git diff
diff --git a/heatshrink_encoder.c b/heatshrink_encoder.c
index edf4abe..be216ee 100644
--- a/heatshrink_encoder.c
+++ b/heatshrink_encoder.c
@@ -471,8 +471,10 @@ static uint16_t find_longest_match(heatshrink_encoder *hse, uint16_t start,
          * This is redundant with the index if match_maxlen is 0, but the
          * added branch overhead to check if it == 0 seems to be worse. */
         if (pospoint[match_maxlen] != needlepoint[match_maxlen]) {
-            pos = hsi->index[pos];
-            continue;
+            if (pos != hsi->index[pos]) {
+                pos = hsi->index[pos];
+                continue;
+            }
         }
 
         for (len = 1; len < maxlen; len++) {
@@ -484,6 +486,9 @@ static uint16_t find_longest_match(heatshrink_encoder *hse, uint16_t start,
             match_index = pos;
             if (len == maxlen) { break; } /* won't find better */
         }
+        if (pos == hsi->index[pos]) {
+            break;
+        }
         pos = hsi->index[pos];
     }
 #else   

@ecm-pushbx
Copy link
Author

Curiously, -w 15 almost always ends up with negative compression for me (ie, expanding the output file to be larger than the input file). So for now I just exclude -w 15 from my scripts that determine the best compression parameters.

@silentbicycle
Copy link
Collaborator

Thanks for your detailed bug report! This will be fixed in the next release.

I wouldn't expect -w 15 to compress effectively, except in very contrived circumstances.

@silentbicycle silentbicycle added this to the 0.4.2 milestone Jun 8, 2021
@ecm-pushbx
Copy link
Author

Do you still intend to create a new revision of the library? I ask because I just recommended it to someone else on the freedos-devel mailing list and I stumbled across the allowed parameter ranges being larger than my script used to try. (And a -w parameter below 10 often wins out for the lDebug online help pages, as these are all below 3 KiB each.) That led me to rediscovering the -w 15 bug on large files.

@silentbicycle
Copy link
Collaborator

Yes. It will probably be a few weeks before I get to it because I'm currently busy with moving, but I currently plan to tie up some loose ends and post another release by the end of the year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants