Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix encoder bug when finishing #87

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sameer
Copy link

@sameer sameer commented May 19, 2024

Via some fuzzing, I found a finish scenario that can result in incorrect encoding. This PR addresses the bug by adding a check for finishing && input size == 0 in the search step to transition to flush bits.

Explanation:

At finish time, if the input_size is 0 then there is no more data to encode and the remaining bits should be flushed. However, because the search step does input_size - 1 this results in an overflow and the search step transitions to yielding a tag bit instead. I'm not really sure what happens from this point on, but I suspect it may loop infinitely.

@BenBE
Copy link

BenBE commented May 20, 2024

This kinda sounds like a bug I encountered a few months back, that was triggered, when the following set of conditions were met:

  1. The output buffer was full
  2. The output of the last literal was mid-way (in my case first bit of last literal value written, 7 more for the literal to go that didn't fit)
  3. No further bytes in the input buffer
  4. The encoder trying to finish up

Due to some API inconsistency, the return value was EMPTY instead of OUTPUT_MORE, thus my driver code for compression ended up inadvertently dropping the last byte of the output. It didn't cause an infinite loop, but rather an early termination of the encoder. The example implementation of the CLI utility is not affected; and also was, what in the end allowed me to locate my encoder bug.

@sameer
Copy link
Author

sameer commented May 20, 2024

It didn't cause an infinite loop, but rather an early termination of the encoder.

That's the behavior I noticed in a Rust port I'm writing fix ref. But I wasn't totally convinced there was no loop in the C implementation here.

@silentbicycle
Copy link
Collaborator

Thanks. I'll make sure this is fixed in the next release.

@trueb2
Copy link

trueb2 commented Aug 11, 2024

I also encountered this during some fuzzing on a Rust port. My st_step_search impl looks like this now

fn st_step_search(&mut self) -> HSEState {
    let window_length = self.input_buffer_size;
    let lookahead_sz = self.lookahead_size;
    let msi = self.match_scan_index;

    if msi > self.input_size.saturating_sub(lookahead_sz) {
        return HSEState::SaveBacklog;
    } else if unlikely(self.is_finishing()) && msi >= self.input_size {
        return HSEState::FlushBits;
    }

    let input_offset = self.get_input_offset();
    let end = input_offset + msi;
    let start = end - window_length;

...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants