Skip to content

Added ExpireSnapshots Feature #1880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

ForeverAngry
Copy link

Summary

This PR Closes issue #516 by implementing support for the ExpireSnapshot table metadata action.

Rationale

The ExpireSnapshot action is a core part of Iceberg’s table maintenance APIs. Adding support for this action in PyIceberg helps ensure feature parity with other language implementations (e.g., Java) and supports users who want to programmatically manage snapshot retention using PyIceberg’s public API.

Testing

  • Unit tests have been added to cover the initial expected usage paths.
  • Additional feedback on edge cases, missing test scenarios or corrections to the setup test logic is greatly welcome during the review process.

User-facing changes

  • This change introduces a new public API: ExpireSnapshot.
  • No breaking changes or modifications to existing APIs were made.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ForeverAngry for raising this PR! I'll go into details tomorrow morning (UTC+2 here). Could you resolve the conflicts to get the CI running?

Copy link
Author

@ForeverAngry ForeverAngry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied intial suggestions so that CI can run on the PR.

Copy link
Author

@ForeverAngry ForeverAngry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebuilt the poetry lock file.

@ForeverAngry ForeverAngry requested a review from Fokko April 4, 2025 20:36
Moved expiration-related methods from `ExpireSnapshots` to `ManageSnapshots` for improved organization and clarity.

Updated corresponding pytest tests to reflect these changes.
Re-ran the `poetry run pre-commit run --all-files` command on the project.
Re-ran the `poetry run pre-commit run --all-files` command on the project.
@ForeverAngry
Copy link
Author

After looking at the way the action here was implemented, I refined the changes. Let me know if these make sense :)

Copy link
Author

@ForeverAngry ForeverAngry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinjqliu thoughts?

Copy link
Author

@ForeverAngry ForeverAngry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed.

Copy link
Author

@ForeverAngry ForeverAngry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebased from main

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a merge conflict somewhere, there are a lot of code here from other PRs. Perhaps you need to update your fork's main branch.

Copy link
Author

@ForeverAngry ForeverAngry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved: methods to expire snapshots to their own class so that it could implment the _commit() method for any new functions added to the class.

Moved: the functions for expiring snapshots to their own class.
@ForeverAngry ForeverAngry requested a review from kevinjqliu April 22, 2025 01:25
Returns:
This for method chaining.
"""
# Collect IDs of snapshots to be expired
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunally, it is not that simple to just look at the time alone. Instead, there are some rules, for example:

The easiest way of going through the logic is following this method: https://github.com/apache/iceberg/blob/3f661d5c6657542538a1e944db57405efdefea29/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L179

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might just pull this out into another issue, separate from this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hestitant to do that, because when folks would run it, it might break their tables 😱

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, for now, ill remove the expire_snapshots_older_than method, for now, and contribute that in another PR.

…ng it in a separate issue.

Fixed: unrelated changes caused by afork/branch sync issues.
@ForeverAngry ForeverAngry requested a review from Fokko April 24, 2025 00:53
ForeverAngry and others added 4 commits April 26, 2025 08:49
Implemented logic to protect the HEAD branches or Tagged branches from being expired by the `expire_snapshot_by_id` method.
@@ -55,6 +55,7 @@
from pyiceberg.partitioning import (
PartitionSpec,
)
from pyiceberg.table.refs import SnapshotRefType
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasnt sure if it was preferred to use a string literal or the SnapshotRefType type. I used the latter, let me know if this isnt preferred.

@ForeverAngry ForeverAngry requested a review from Fokko April 26, 2025 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants