Skip to content

feat: Add shared array and buffer to nanoarrow.h#864

Open
paleolimbot wants to merge 13 commits into
apache:mainfrom
paleolimbot:shared-array-and-buffer
Open

feat: Add shared array and buffer to nanoarrow.h#864
paleolimbot wants to merge 13 commits into
apache:mainfrom
paleolimbot:shared-array-and-buffer

Conversation

@paleolimbot
Copy link
Copy Markdown
Member

For #861 (actual decoding of dictionary arrays), we really need shared arrays for this to be reasonable (e.g., to not deep copy each dictionary for each batch!).

This PR (1) moves shared buffers to the C library instead of the IPC extension, (2) implements a second kind of shared buffer, which is borrowed from an owned reference-counted array, and (3) implements a "clone" that mostly uses the second concept to explode an array into 100% reference counted buffers that we can clone.

I had hoped we could replace the R and Python versions of these but the logic is fuzzier there because once an array has been referenced we can't mess with it or any of its parents (or it will cause a crash). My mind explodes (and I get a lot of failing R tests cases) whenever I mess with that piece and so I'll try to deal with that a different day.

This still needs tests for the shared array move and clone.

paleolimbot added a commit that referenced this pull request Apr 18, 2026
This PR uses the previous steps to output arrays (including the
ArrowArrayStream reader). This lets us wire it in to all the tests as
well.

The main follow up is that this PR currently deep copies the dictionary
for every batch that arrives, negating much of the point of dictionary
encoding. This is a fairly self-contained change that I'll do
separately: #864

I also added dictionary index validation while I was here! It is fairly
compact (compared to the other code).

Closes #845.
@paleolimbot paleolimbot force-pushed the shared-array-and-buffer branch from b6c2269 to a49c349 Compare May 3, 2026 03:01
Copy link
Copy Markdown
Contributor

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work - i think this looks pretty good

@paleolimbot
Copy link
Copy Markdown
Member Author

Thank you for the review! I'll beef up the tests on this in the next few days 🙂 .

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 81.81818% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.59%. Comparing base (f11d517) to head (ab6a844).
⚠️ Report is 45 commits behind head on main.

Files with missing lines Patch % Lines
src/nanoarrow/ipc/decoder.c 71.42% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #864      +/-   ##
==========================================
- Coverage   79.96%   78.59%   -1.38%     
==========================================
  Files         105      106       +1     
  Lines       15461    16118     +657     
  Branches     1738     1858     +120     
==========================================
+ Hits        12364    12668     +304     
- Misses       1998     2283     +285     
- Partials     1099     1167      +68     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@paleolimbot paleolimbot marked this pull request as ready for review May 13, 2026 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants