Skip to content

Conversation

@ForeverAngry
Copy link
Contributor

Summary

This PR Closes issue #516 by implementing support for the ExpireSnapshot table metadata action.

Rationale

The ExpireSnapshot action is a core part of Iceberg’s table maintenance APIs. Adding support for this action in PyIceberg helps ensure feature parity with other language implementations (e.g., Java) and supports users who want to programmatically manage snapshot retention using PyIceberg’s public API.

Testing

  • Unit tests have been added to cover the initial expected usage paths.
  • Additional feedback on edge cases, missing test scenarios or corrections to the setup test logic is greatly welcome during the review process.

User-facing changes

  • This change introduces a new public API: ExpireSnapshot.
  • No breaking changes or modifications to existing APIs were made.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ForeverAngry for raising this PR! I'll go into details tomorrow morning (UTC+2 here). Could you resolve the conflicts to get the CI running?

Copy link
Contributor Author

@ForeverAngry ForeverAngry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied intial suggestions so that CI can run on the PR.

Copy link
Contributor Author

@ForeverAngry ForeverAngry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebuilt the poetry lock file.

@ForeverAngry ForeverAngry requested a review from Fokko April 4, 2025 20:36
Moved expiration-related methods from `ExpireSnapshots` to `ManageSnapshots` for improved organization and clarity.

Updated corresponding pytest tests to reflect these changes.
Re-ran the `poetry run pre-commit run --all-files` command on the project.
Re-ran the `poetry run pre-commit run --all-files` command on the project.
@ForeverAngry
Copy link
Contributor Author

After looking at the way the action here was implemented, I refined the changes. Let me know if these make sense :)

Copy link
Contributor Author

@ForeverAngry ForeverAngry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinjqliu thoughts?

Copy link
Contributor Author

@ForeverAngry ForeverAngry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed.

Copy link
Contributor Author

@ForeverAngry ForeverAngry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebased from main

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a merge conflict somewhere, there are a lot of code here from other PRs. Perhaps you need to update your fork's main branch.

@ForeverAngry
Copy link
Contributor Author

ForeverAngry commented May 17, 2025

@ForeverAngry Sorry for the late reply, it looks like that there is a test failing now 👀

I think this commit should fix the test error, i also added additional tests. All passed - and appear to be in-good-order. 🤞 this time is the charm.

Updated test_expire_snapshots.py to use model_copy(update={...}) for modifying the refs attribute of the table metadata, ensuring compatibility with Pydantic's frozen models.
Fixed all snapshot expiration tests to avoid direct assignment to frozen attributes.
All tests now pass, confirming correct behavior for protected and unprotected snapshot expiration.
@zschumacher
Copy link

this would be great - whats the status of this?

@smaheshwar-pltr
Copy link
Contributor

I see that in #1958, for orphaned file removal, we decided to have a table.maintenance API returning a MaintenanceTable. As a user, if I have to do orphaned file removal via that table.maintenance API, but snapshot expiration instead via table.expire_snapshots, I might be confused (they both feel like table maintenance to me, though snapshot expiration is admittedly a bit stronger). Curious about people's thoughts here.

@ForeverAngry
Copy link
Contributor Author

Well, right now this pr doesn't do anything with the newly orphaned files. It just handles the metadata operation.

@Fokko
Copy link
Contributor

Fokko commented Jun 11, 2025

I see that in #1958, for orphaned file removal, we decided to have a table.maintenance API returning a MaintenanceTable. As a user, if I have to do orphaned file removal via that table.maintenance API, but snapshot expiration instead via table.expire_snapshots, I might be confused (they both feel like table maintenance to me, though snapshot expiration is admittedly a bit stronger). Curious about people's thoughts here.

Yes, I agree with you there. Before doing the 0.10.0 release, we need to ensure we align on this and make proper docs. I have a slight preference towards .mainenance to have a clear distinction between maintenance and the regular operations (such as creating a tag or branch).

@ForeverAngry
Copy link
Contributor Author

I'm happy to follow the '.maintenance' api design if there is a strong preference toward it.

ForeverAngry and others added 3 commits June 13, 2025 21:08
Co-authored-by: Fokko Driesprong <[email protected]>
Added ExpireSnapshots.expire_snapshots_older_than and expire_snapshots_by_ids methods to support expiring snapshots by timestamp and by multiple IDs.

Ensured that protected snapshots (branch/tag heads) cannot be expired, both at the API and commit stages.

Updated the expiration logic to always skip protected snapshots, even if they are accidentally included.

Added and fixed tests to verify that protected snapshots are never expired and that expiration works as expected for unprotected snapshots.

Improved test setup to accurately reflect post-commit metadata and to assert correct expiration behavior.
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ForeverAngry for working on this, and I think it is ready to go 👍

@ForeverAngry
Copy link
Contributor Author

Thanks @ForeverAngry for working on this, and I think it is ready to go 👍

Great! I think @kevinjqliu is still listed as needing approval. @kevinjqliu can you put your stamp on this as well?

@Fokko
Copy link
Contributor

Fokko commented Jun 19, 2025

@ForeverAngry I think we can move this one forward. Before the release, we need to follow up on two things:

  • Add a new Maintenance doc section with a subsection that explains the expire snapshots operation.
  • Move the expire snapshots operation under maintenance: tbl.maintenance.expire_snapshots()

@Fokko Fokko merged commit 1bdc24e into apache:main Jun 19, 2025
10 checks passed
@Fokko
Copy link
Contributor

Fokko commented Jun 19, 2025

Thanks again @ForeverAngry for working on this 🚀

@ForeverAngry
Copy link
Contributor Author

Thanks again @ForeverAngry for working on this 🚀

Thank you, for being such a supportive and inspiring member to work with!

@ForeverAngry
Copy link
Contributor Author

@ForeverAngry I think we can move this one forward. Before the release, we need to follow up on two things:

  • Add a new Maintenance doc section with a subsection that explains the expire snapshots operation.
  • Move the expire snapshots operation under maintenance: tbl.maintenance.expire_snapshots()

I'll work on this, this weekend!

@Fokko
Copy link
Contributor

Fokko commented Jun 20, 2025

@ForeverAngry Appreciate that, thanks! 🙌

@greenlaw
Copy link

@ForeverAngry Thank you for this feature ❤️

Just one question/comment: It seems this only supports expiration time/age, and does not support other retention policies. For example, the Java API's ExpireSnapshots supports retainLast, and ManageSnapshots supports setMinSnapshotsToKeep. Any plans to add support for these features, by chance?

@ForeverAngry
Copy link
Contributor Author

@ForeverAngry Thank you for this feature ❤️

Just one question/comment: It seems this only supports expiration time/age, and does not support other retention policies. For example, the Java API's ExpireSnapshots supports retainLast, and ManageSnapshots supports setMinSnapshotsToKeep. Any plans to add support for these features, by chance?

Yeah, those slipped my mind when I originally did it. I'd be happy to implement those. :)

@ForeverAngry ForeverAngry deleted the 1880-add-expire-snapshots branch June 25, 2025 22:31
amitgilad3 pushed a commit to amitgilad3/iceberg-python that referenced this pull request Jul 7, 2025
## Summary

This PR Closes issue apache#516 by implementing support for the
`ExpireSnapshot` table metadata action.

## Rationale

The `ExpireSnapshot` action is a core part of Iceberg’s table
maintenance APIs. Adding support for this action in PyIceberg helps
ensure feature parity with other language implementations (e.g., Java)
and supports users who want to programmatically manage snapshot
retention using PyIceberg’s public API.

## Testing

- Unit tests have been added to cover the initial expected usage paths.
- Additional feedback on edge cases, missing test scenarios or
corrections to the setup test logic is greatly welcome during the review
process.

## User-facing changes

- This change introduces a new public API: `ExpireSnapshot`.
- No breaking changes or modifications to existing APIs were made.

---

---------

Co-authored-by: Fokko Driesprong <[email protected]>
Fokko added a commit that referenced this pull request Aug 12, 2025
<!--
Thanks for opening a pull request!
-->

<!-- Closes #2150 -->

# Rationale for this change

- Consolidates snapshot expiration functionality from the standalone
`ExpireSnapshots` class into the `MaintenanceTable` class for a unified
maintenance API.
- Resolves planned work left over from #1880, and closes
#2142
- Achieves feature and API parity with the Java implementation for
snapshot retention and table maintenance.

# Features & Enhancements

- Introduces `table.maintenance.expire_snapshots()` as the unified entry
point for snapshot expiration and future maintenance operations.
- Retains the existing `ExpireSnapshots` implementation internally. The
`expire_snapshots()` method on `MaintenanceTable` now returns an
`ExpireSnapshots` object, preserving transaction semantics and
supporting context manager usage:
  ```python
  with table.maintenance.expire_snapshots() as expire_snapshots:
      expire_snapshots.by_id(1)
      expire_snapshots.by_id(2)
  ```
- Focuses this PR on refactoring and documentation improvements, while
maintaining compatibility with the prior `ExpireSnapshots` interface.
- Sets a foundation for future expansion of the `MaintenanceTable`
abstraction to encapsulate additional maintenance operations.


# Bug Fixes & Cleanups

- **ManageSnapshots Cleanup
([#2151](#2151
- Removes an unrelated instance variable from the `ManageSnapshots`
class, aligning with the Java reference implementation.

# Testing & Documentation

- **Testing:**
  - Tested the new API interface including:
    - Expiration by ID 
    - Protection of branch/tag snapshots
- **Documentation:**
  - Added and updated documentation to describe:
    - API usage examples

Preview:
<img width="1686" height="1015" alt="Screenshot 2025-08-11 at 1 37
04 PM"
src="https://github.com/user-attachments/assets/f469f3fc-b4b1-4ec9-b1ca-b9185e22643e"
/>


# Are these changes tested?

Yes. All changes are tested.~, with this PR predicated on the final
changes from #1200.~ This work builds on the framework introduced by
@jayceslesar in #1200 for the `MaintenanceTable`.

# Are there any user-facing changes?

---

**Closes:**  
- Closes #2151
- Closes #2142

---------

Co-authored-by: Fokko Driesprong <[email protected]>
Co-authored-by: Kevin Liu <[email protected]>
gabeiglio pushed a commit to Netflix/iceberg-python that referenced this pull request Aug 13, 2025
## Summary

This PR Closes issue apache#516 by implementing support for the
`ExpireSnapshot` table metadata action.

## Rationale

The `ExpireSnapshot` action is a core part of Iceberg’s table
maintenance APIs. Adding support for this action in PyIceberg helps
ensure feature parity with other language implementations (e.g., Java)
and supports users who want to programmatically manage snapshot
retention using PyIceberg’s public API.

## Testing

- Unit tests have been added to cover the initial expected usage paths.
- Additional feedback on edge cases, missing test scenarios or
corrections to the setup test logic is greatly welcome during the review
process.

## User-facing changes

- This change introduces a new public API: `ExpireSnapshot`.
- No breaking changes or modifications to existing APIs were made.

---

---------

Co-authored-by: Fokko Driesprong <[email protected]>
gabeiglio pushed a commit to Netflix/iceberg-python that referenced this pull request Aug 13, 2025
…he#2143)

<!--
Thanks for opening a pull request!
-->

<!-- Closes apache#2150 -->

# Rationale for this change

- Consolidates snapshot expiration functionality from the standalone
`ExpireSnapshots` class into the `MaintenanceTable` class for a unified
maintenance API.
- Resolves planned work left over from apache#1880, and closes
apache#2142
- Achieves feature and API parity with the Java implementation for
snapshot retention and table maintenance.

# Features & Enhancements

- Introduces `table.maintenance.expire_snapshots()` as the unified entry
point for snapshot expiration and future maintenance operations.
- Retains the existing `ExpireSnapshots` implementation internally. The
`expire_snapshots()` method on `MaintenanceTable` now returns an
`ExpireSnapshots` object, preserving transaction semantics and
supporting context manager usage:
  ```python
  with table.maintenance.expire_snapshots() as expire_snapshots:
      expire_snapshots.by_id(1)
      expire_snapshots.by_id(2)
  ```
- Focuses this PR on refactoring and documentation improvements, while
maintaining compatibility with the prior `ExpireSnapshots` interface.
- Sets a foundation for future expansion of the `MaintenanceTable`
abstraction to encapsulate additional maintenance operations.


# Bug Fixes & Cleanups

- **ManageSnapshots Cleanup
([apache#2151](apache#2151
- Removes an unrelated instance variable from the `ManageSnapshots`
class, aligning with the Java reference implementation.

# Testing & Documentation

- **Testing:**
  - Tested the new API interface including:
    - Expiration by ID 
    - Protection of branch/tag snapshots
- **Documentation:**
  - Added and updated documentation to describe:
    - API usage examples

Preview:
<img width="1686" height="1015" alt="Screenshot 2025-08-11 at 1 37
04 PM"
src="https://github.com/user-attachments/assets/f469f3fc-b4b1-4ec9-b1ca-b9185e22643e"
/>


# Are these changes tested?

Yes. All changes are tested.~, with this PR predicated on the final
changes from apache#1200.~ This work builds on the framework introduced by
@jayceslesar in apache#1200 for the `MaintenanceTable`.

# Are there any user-facing changes?

---

**Closes:**  
- Closes apache#2151
- Closes apache#2142

---------

Co-authored-by: Fokko Driesprong <[email protected]>
Co-authored-by: Kevin Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants