-
Notifications
You must be signed in to change notification settings - Fork 270
Added ExpireSnapshots Feature #1880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…h a new Expired Snapshot class. updated tests.
ValueError: Cannot expire snapshot IDs {3051729675574597004} as they are currently referenced by table refs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ForeverAngry for raising this PR! I'll go into details tomorrow morning (UTC+2 here). Could you resolve the conflicts to get the CI running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied intial suggestions so that CI can run on the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebuilt the poetry lock file.
Moved expiration-related methods from `ExpireSnapshots` to `ManageSnapshots` for improved organization and clarity. Updated corresponding pytest tests to reflect these changes.
Re-ran the `poetry run pre-commit run --all-files` command on the project.
Re-ran the `poetry run pre-commit run --all-files` command on the project.
After looking at the way the action here was implemented, I refined the changes. Let me know if these make sense :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinjqliu thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebased from main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a merge conflict somewhere, there are a lot of code here from other PRs. Perhaps you need to update your fork's main branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved: methods to expire snapshots to their own class so that it could implment the _commit() method for any new functions added to the class.
Moved: the functions for expiring snapshots to their own class.
pyiceberg/table/update/snapshot.py
Outdated
Returns: | ||
This for method chaining. | ||
""" | ||
# Collect IDs of snapshots to be expired |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunally, it is not that simple to just look at the time alone. Instead, there are some rules, for example:
- We want to make sure that we don't remove the head snapshot of each of the branches: https://github.com/apache/iceberg/blob/3f661d5c6657542538a1e944db57405efdefea29/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L297-L310
- Don't remove any snapshots that are tagged: https://github.com/apache/iceberg/blob/3f661d5c6657542538a1e944db57405efdefea29/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L255-L278
The easiest way of going through the logic is following this method: https://github.com/apache/iceberg/blob/3f661d5c6657542538a1e944db57405efdefea29/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L179
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might just pull this out into another issue, separate from this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hestitant to do that, because when folks would run it, it might break their tables 😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, for now, ill remove the expire_snapshots_older_than
method, for now, and contribute that in another PR.
…ng it in a separate issue. Fixed: unrelated changes caused by afork/branch sync issues.
Co-authored-by: Fokko Driesprong <[email protected]>
Implemented logic to protect the HEAD branches or Tagged branches from being expired by the `expire_snapshot_by_id` method.
@@ -55,6 +55,7 @@ | |||
from pyiceberg.partitioning import ( | |||
PartitionSpec, | |||
) | |||
from pyiceberg.table.refs import SnapshotRefType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasnt sure if it was preferred to use a string literal or the SnapshotRefType type. I used the latter, let me know if this isnt preferred.
Summary
This PR Closes issue #516 by implementing support for the
ExpireSnapshot
table metadata action.Rationale
The
ExpireSnapshot
action is a core part of Iceberg’s table maintenance APIs. Adding support for this action in PyIceberg helps ensure feature parity with other language implementations (e.g., Java) and supports users who want to programmatically manage snapshot retention using PyIceberg’s public API.Testing
User-facing changes
ExpireSnapshot
.