Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(puffin): Add PuffinWriter #959

Merged
merged 2 commits into from
Apr 1, 2025
Merged

Conversation

fqaiser94
Copy link
Contributor

@fqaiser94 fqaiser94 commented Feb 9, 2025

Part of #744

Summary

  • Add PuffinWriter

Context

  • This is the fifth of a number of PRs to add support for Iceberg Puffin file format.
  • It might be helpful to refer to the overarching feat(puffin): Add reader and writer #714 from which these changes were split to understand better how these changes will fit in to the larger picture.
  • It may also be helpful to refer to the Java reference implementation for PuffinWriter here.

@fqaiser94 fqaiser94 changed the title Add PuffinWriter feat(puffin): Add PuffinWriter Feb 9, 2025
@fqaiser94 fqaiser94 mentioned this pull request Feb 8, 2025
8 tasks
@fqaiser94 fqaiser94 force-pushed the puffin_writer branch 3 times, most recently from af396c4 to d3749f4 Compare February 9, 2025 21:26
@fqaiser94 fqaiser94 marked this pull request as ready for review February 10, 2025 11:01
@fqaiser94 fqaiser94 marked this pull request as draft February 10, 2025 11:03
@jonathanc-n
Copy link
Contributor

@fqaiser94 Whats the current idea for this? I can offer help if needed.

@fqaiser94
Copy link
Contributor Author

@fqaiser94 Whats the current idea for this? I can offer help if needed.

Forgot about this, let me try to revive this over the weekend

@fqaiser94 fqaiser94 marked this pull request as ready for review March 23, 2025 14:51
@fqaiser94
Copy link
Contributor Author

cc: @Xuanwo @liurenjie1024

@liurenjie1024 liurenjie1024 requested a review from Copilot March 31, 2025 13:32
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the PuffinWriter for writing Iceberg Puffin files with a focus on thread safety via a mutex lock for file writes. Key changes include:

  • Implementation of the PuffinWriter with header and footer handling.
  • Integration of blob compression and metadata tracking.
  • Addition of tests to validate behavior across various scenarios.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
crates/iceberg/src/puffin/writer.rs Introduces the new PuffinWriter and its API for writing blobs.
crates/iceberg/src/puffin/mod.rs Registers the new writer module.
Comments suppressed due to low confidence (1)

crates/iceberg/src/puffin/writer.rs:196

  • [nitpick] The test asserts on an exact error message string when add is called after closing. Consider using a dedicated error type or variant in order to make error matching more robust and less brittle.
assert_eq!( writer.add(blob_0(), CompressionCodec::None).await.unwrap_err().to_string(),

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fqaiser94 for this great pr, just some minor suggestions. Maybe we can remove the WriterState?

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fqaiser94 for this great pr, LGTM!

@liurenjie1024 liurenjie1024 merged commit 76e87a8 into apache:main Apr 1, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants