Skip to content

Issue with decodeRunRepeated #127

@aizard-trackinsight

Description

@aizard-trackinsight

Hi,

I think there is an issue here:

value << 8;

I worked on a parquet file where decodeRunRepeated was basically supposed to convert a [18, 1] buffer into 274 as the repeated value but yielded 19 instead.

[18,1] is supposed to be interpreted as 18 * 2^(8 * 0) + 1 * 2^(8 * 1) = 18 + 256 = 274, which would lead to something like this:

value += (cursor.buffer[cursor.offset] << 8*i)

The current code yields the correct result if there is only one byte needed: [18, 0] yields 18 which is expected.

The issue is only visible if the parquet file has some repeated values above 256, as those repeated values will need more than 1 bytes to be encoded, and the current code would yield incorrect values.

I think value << 8 without affectation has no effect. There might be a similar problem in the encoding function but I haven't used it so far:

value >> 8;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions