-
Notifications
You must be signed in to change notification settings - Fork 387
Added ExpireSnapshots Feature #1880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…h a new Expired Snapshot class. updated tests.
ValueError: Cannot expire snapshot IDs {3051729675574597004} as they are currently referenced by table refs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ForeverAngry for raising this PR! I'll go into details tomorrow morning (UTC+2 here). Could you resolve the conflicts to get the CI running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied intial suggestions so that CI can run on the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebuilt the poetry lock file.
Moved expiration-related methods from `ExpireSnapshots` to `ManageSnapshots` for improved organization and clarity. Updated corresponding pytest tests to reflect these changes.
Re-ran the `poetry run pre-commit run --all-files` command on the project.
Re-ran the `poetry run pre-commit run --all-files` command on the project.
|
After looking at the way the action here was implemented, I refined the changes. Let me know if these make sense :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinjqliu thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebased from main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a merge conflict somewhere, there are a lot of code here from other PRs. Perhaps you need to update your fork's main branch.
I think this commit should fix the test error, i also added additional tests. All passed - and appear to be in-good-order. 🤞 this time is the charm. |
Updated test_expire_snapshots.py to use model_copy(update={...}) for modifying the refs attribute of the table metadata, ensuring compatibility with Pydantic's frozen models.
Fixed all snapshot expiration tests to avoid direct assignment to frozen attributes.
All tests now pass, confirming correct behavior for protected and unprotected snapshot expiration.
|
this would be great - whats the status of this? |
|
I see that in #1958, for orphaned file removal, we decided to have a |
|
Well, right now this pr doesn't do anything with the newly orphaned files. It just handles the metadata operation. |
Yes, I agree with you there. Before doing the 0.10.0 release, we need to ensure we align on this and make proper docs. I have a slight preference towards |
|
I'm happy to follow the '.maintenance' api design if there is a strong preference toward it. |
Co-authored-by: Fokko Driesprong <[email protected]>
Added ExpireSnapshots.expire_snapshots_older_than and expire_snapshots_by_ids methods to support expiring snapshots by timestamp and by multiple IDs. Ensured that protected snapshots (branch/tag heads) cannot be expired, both at the API and commit stages. Updated the expiration logic to always skip protected snapshots, even if they are accidentally included. Added and fixed tests to verify that protected snapshots are never expired and that expiration works as expected for unprotected snapshots. Improved test setup to accurately reflect post-commit metadata and to assert correct expiration behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ForeverAngry for working on this, and I think it is ready to go 👍
Great! I think @kevinjqliu is still listed as needing approval. @kevinjqliu can you put your stamp on this as well? |
|
@ForeverAngry I think we can move this one forward. Before the release, we need to follow up on two things:
|
|
Thanks again @ForeverAngry for working on this 🚀 |
Thank you, for being such a supportive and inspiring member to work with! |
I'll work on this, this weekend! |
|
@ForeverAngry Appreciate that, thanks! 🙌 |
|
@ForeverAngry Thank you for this feature ❤️ Just one question/comment: It seems this only supports expiration time/age, and does not support other retention policies. For example, the Java API's ExpireSnapshots supports retainLast, and ManageSnapshots supports setMinSnapshotsToKeep. Any plans to add support for these features, by chance? |
Yeah, those slipped my mind when I originally did it. I'd be happy to implement those. :) |
## Summary This PR Closes issue apache#516 by implementing support for the `ExpireSnapshot` table metadata action. ## Rationale The `ExpireSnapshot` action is a core part of Iceberg’s table maintenance APIs. Adding support for this action in PyIceberg helps ensure feature parity with other language implementations (e.g., Java) and supports users who want to programmatically manage snapshot retention using PyIceberg’s public API. ## Testing - Unit tests have been added to cover the initial expected usage paths. - Additional feedback on edge cases, missing test scenarios or corrections to the setup test logic is greatly welcome during the review process. ## User-facing changes - This change introduces a new public API: `ExpireSnapshot`. - No breaking changes or modifications to existing APIs were made. --- --------- Co-authored-by: Fokko Driesprong <[email protected]>
<!-- Thanks for opening a pull request! --> <!-- Closes #2150 --> # Rationale for this change - Consolidates snapshot expiration functionality from the standalone `ExpireSnapshots` class into the `MaintenanceTable` class for a unified maintenance API. - Resolves planned work left over from #1880, and closes #2142 - Achieves feature and API parity with the Java implementation for snapshot retention and table maintenance. # Features & Enhancements - Introduces `table.maintenance.expire_snapshots()` as the unified entry point for snapshot expiration and future maintenance operations. - Retains the existing `ExpireSnapshots` implementation internally. The `expire_snapshots()` method on `MaintenanceTable` now returns an `ExpireSnapshots` object, preserving transaction semantics and supporting context manager usage: ```python with table.maintenance.expire_snapshots() as expire_snapshots: expire_snapshots.by_id(1) expire_snapshots.by_id(2) ``` - Focuses this PR on refactoring and documentation improvements, while maintaining compatibility with the prior `ExpireSnapshots` interface. - Sets a foundation for future expansion of the `MaintenanceTable` abstraction to encapsulate additional maintenance operations. # Bug Fixes & Cleanups - **ManageSnapshots Cleanup ([#2151](#2151 - Removes an unrelated instance variable from the `ManageSnapshots` class, aligning with the Java reference implementation. # Testing & Documentation - **Testing:** - Tested the new API interface including: - Expiration by ID - Protection of branch/tag snapshots - **Documentation:** - Added and updated documentation to describe: - API usage examples Preview: <img width="1686" height="1015" alt="Screenshot 2025-08-11 at 1 37 04 PM" src="https://github.com/user-attachments/assets/f469f3fc-b4b1-4ec9-b1ca-b9185e22643e" /> # Are these changes tested? Yes. All changes are tested.~, with this PR predicated on the final changes from #1200.~ This work builds on the framework introduced by @jayceslesar in #1200 for the `MaintenanceTable`. # Are there any user-facing changes? --- **Closes:** - Closes #2151 - Closes #2142 --------- Co-authored-by: Fokko Driesprong <[email protected]> Co-authored-by: Kevin Liu <[email protected]>
## Summary This PR Closes issue apache#516 by implementing support for the `ExpireSnapshot` table metadata action. ## Rationale The `ExpireSnapshot` action is a core part of Iceberg’s table maintenance APIs. Adding support for this action in PyIceberg helps ensure feature parity with other language implementations (e.g., Java) and supports users who want to programmatically manage snapshot retention using PyIceberg’s public API. ## Testing - Unit tests have been added to cover the initial expected usage paths. - Additional feedback on edge cases, missing test scenarios or corrections to the setup test logic is greatly welcome during the review process. ## User-facing changes - This change introduces a new public API: `ExpireSnapshot`. - No breaking changes or modifications to existing APIs were made. --- --------- Co-authored-by: Fokko Driesprong <[email protected]>
…he#2143) <!-- Thanks for opening a pull request! --> <!-- Closes apache#2150 --> # Rationale for this change - Consolidates snapshot expiration functionality from the standalone `ExpireSnapshots` class into the `MaintenanceTable` class for a unified maintenance API. - Resolves planned work left over from apache#1880, and closes apache#2142 - Achieves feature and API parity with the Java implementation for snapshot retention and table maintenance. # Features & Enhancements - Introduces `table.maintenance.expire_snapshots()` as the unified entry point for snapshot expiration and future maintenance operations. - Retains the existing `ExpireSnapshots` implementation internally. The `expire_snapshots()` method on `MaintenanceTable` now returns an `ExpireSnapshots` object, preserving transaction semantics and supporting context manager usage: ```python with table.maintenance.expire_snapshots() as expire_snapshots: expire_snapshots.by_id(1) expire_snapshots.by_id(2) ``` - Focuses this PR on refactoring and documentation improvements, while maintaining compatibility with the prior `ExpireSnapshots` interface. - Sets a foundation for future expansion of the `MaintenanceTable` abstraction to encapsulate additional maintenance operations. # Bug Fixes & Cleanups - **ManageSnapshots Cleanup ([apache#2151](apache#2151 - Removes an unrelated instance variable from the `ManageSnapshots` class, aligning with the Java reference implementation. # Testing & Documentation - **Testing:** - Tested the new API interface including: - Expiration by ID - Protection of branch/tag snapshots - **Documentation:** - Added and updated documentation to describe: - API usage examples Preview: <img width="1686" height="1015" alt="Screenshot 2025-08-11 at 1 37 04 PM" src="https://github.com/user-attachments/assets/f469f3fc-b4b1-4ec9-b1ca-b9185e22643e" /> # Are these changes tested? Yes. All changes are tested.~, with this PR predicated on the final changes from apache#1200.~ This work builds on the framework introduced by @jayceslesar in apache#1200 for the `MaintenanceTable`. # Are there any user-facing changes? --- **Closes:** - Closes apache#2151 - Closes apache#2142 --------- Co-authored-by: Fokko Driesprong <[email protected]> Co-authored-by: Kevin Liu <[email protected]>
Summary
This PR Closes issue #516 by implementing support for the
ExpireSnapshottable metadata action.Rationale
The
ExpireSnapshotaction is a core part of Iceberg’s table maintenance APIs. Adding support for this action in PyIceberg helps ensure feature parity with other language implementations (e.g., Java) and supports users who want to programmatically manage snapshot retention using PyIceberg’s public API.Testing
User-facing changes
ExpireSnapshot.