Skip to content

Conversation

@Vishwanatha-HD
Copy link
Contributor

@Vishwanatha-HD Vishwanatha-HD commented Nov 21, 2025

Rationale for this change

This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes the Types logic.

What changes are included in this PR?

The fix includes changes to following files:
cpp/src/parquet/types.cc
cpp/src/parquet/types.h

Are these changes tested?

Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.

Are there any user-facing changes?

No

GitHub main Issue link: #48151

@github-actions
Copy link

⚠️ GitHub issue #48208 has been automatically assigned in GitHub to PR creator.

@kou kou changed the title GH-48208 Fix Types logic to enable Parquet DB support on s390x GH-48208: [C++][Parquet] Fix Types logic to enable Parquet DB support on s390x Nov 22, 2025
@k8ika0s
Copy link

k8ika0s commented Nov 23, 2025

@Vishwanatha-HD

One thing I noticed in the INT96 helpers is that the write path here keeps the limbs in host order on BE. That can definitely work, though the thing I ran into on s390x is that callers downstream (decoders, page stats, and FormatStatValue) often assume INT96 is stored in a consistent LE layout regardless of host. When the value arrives in native order instead, those helpers sometimes reconstruct days/nanos differently across architectures.

On the decode side, the arithmetic around days_since_epoch and the nanos split makes sense, but it might still surface small cross-host differences if the limbs weren’t normalized first. That’s where I ended up relying on FromLittleEndian before doing the conversion math.

The stats formatting changes here also caught my eye. Integral paths now swap on BE, which lines up with what I’ve observed. On floats, though, FormatNumericValue tends to expose host-order bytes unless the limbs were normalized earlier in the pipeline. Just mentioning it because stats comparisons have been one of the more “quirky” areas on BE.

None of this is blocking feedback — mostly sharing observations from hardware testing, in case any of it helps stress-test these code paths further. Happy to compare outputs if it’d be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants