Skip to content

Conversation

@ForeverAngry
Copy link
Contributor

@ForeverAngry ForeverAngry commented Jun 23, 2025

Rationale for this change

Features & Enhancements

  • Introduces table.maintenance.expire_snapshots() as the unified entry point for snapshot expiration and future maintenance operations.
  • Retains the existing ExpireSnapshots implementation internally. The expire_snapshots() method on MaintenanceTable now returns an ExpireSnapshots object, preserving transaction semantics and supporting context manager usage:
    with table.maintenance.expire_snapshots() as expire_snapshots:
        expire_snapshots.by_id(1)
        expire_snapshots.by_id(2)
  • Focuses this PR on refactoring and documentation improvements, while maintaining compatibility with the prior ExpireSnapshots interface.
  • Sets a foundation for future expansion of the MaintenanceTable abstraction to encapsulate additional maintenance operations.

Bug Fixes & Cleanups

  • ManageSnapshots Cleanup (#2151)
    • Removes an unrelated instance variable from the ManageSnapshots class, aligning with the Java reference implementation.

Testing & Documentation

  • Testing:
    • Tested the new API interface including:
      • Expiration by ID
      • Protection of branch/tag snapshots
  • Documentation:
    • Added and updated documentation to describe:
      • API usage examples

Preview:
Screenshot 2025-08-11 at 1 37 04 PM

Are these changes tested?

Yes. All changes are tested., with this PR predicated on the final changes from #1200. This work builds on the framework introduced by @jayceslesar in #1200 for the MaintenanceTable.

Are there any user-facing changes?


Closes:

ForeverAngry and others added 25 commits March 28, 2025 20:23
…h a new Expired Snapshot class. updated tests.
 ValueError: Cannot expire snapshot IDs {3051729675574597004} as they are currently referenced by table refs.
Moved expiration-related methods from `ExpireSnapshots` to `ManageSnapshots` for improved organization and clarity.

Updated corresponding pytest tests to reflect these changes.
Re-ran the `poetry run pre-commit run --all-files` command on the project.
Re-ran the `poetry run pre-commit run --all-files` command on the project.
Moved: the functions for expiring snapshots to their own class.
…ng it in a separate issue.

Fixed: unrelated changes caused by afork/branch sync issues.
Implemented logic to protect the HEAD branches or Tagged branches from being expired by the `expire_snapshot_by_id` method.
@ForeverAngry
Copy link
Contributor Author

@Fokko @jayceslesar let me know if you guys prefer i stack this pr into the #1200 or if you both would rather i wait until the #1200 is merged into main, and then rebase on the updated upstream/main, and then create the PR against apache/iceberg-python:main!

@Fokko
Copy link
Contributor

Fokko commented Jun 24, 2025

Great seeing this PR @ForeverAngry, thanks again for working on this! I'm okay with first merging #1200, but we could also merge this first, and adapt the remove orphan files routine to use .maintenance. Let me follow up on the remove orphan files, because there are some open questions there.

@Fokko Fokko added this to the PyIceberg 0.10.0 milestone Jun 24, 2025
@ForeverAngry
Copy link
Contributor Author

@Fokko did you decide if you wanted me to stay stacked on the delete orphans pr, or go ahead and prepare the pr for this, to the main branch?

@ForeverAngry ForeverAngry force-pushed the refactor/consolidate-snapshot-expiration branch from a6c3b63 to 9937894 Compare July 5, 2025 01:10
keep the table.maintenance.expire_snapshots() API signature
Return the existing ExpireSnapshots class that extends UpdateTableMetadata
Enable transaction semantics with context manager support
Focus this PR on API refactoring, move complex retention logic to separate PR
@ForeverAngry ForeverAngry requested a review from Fokko August 10, 2025 11:22
@ForeverAngry
Copy link
Contributor Author

@Fokko @kevinjqliu let me know if this commit looks like what you both were expecting.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @ForeverAngry i think this pr is very close. we need to get it to a good shape for merge and then we can release 0.10 :)

@ForeverAngry
Copy link
Contributor Author

@Fokko can you kick off the workflows for me?

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, lets revert the irrelevant docs and this thing is ready to merge!!

@kevinjqliu
Copy link
Contributor

caught up with @ForeverAngry on slack, helped fix the merge issue with the documentation

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!! Thanks a bunch of the persistence to get this PR into a good state :)

Comment on lines +1307 to +1309

# Method chaining
table.maintenance.expire_snapshots().by_id(12345).commit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same example as above?

Suggested change
# Method chaining
table.maintenance.expire_snapshots().by_id(12345).commit()

Comment on lines +1322 to +1323
for snapshot_id in snapshot_ids:
expire.by_id(snapshot_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the by_ids? This makes the example a bit more consise:

Suggested change
for snapshot_id in snapshot_ids:
expire.by_id(snapshot_id)
expire.by_ids(snapshot_ids)

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good @ForeverAngry

Thanks for your work and all the patience 👍

@Fokko Fokko merged commit 4b961f7 into apache:main Aug 12, 2025
11 of 19 checks passed
gabeiglio pushed a commit to Netflix/iceberg-python that referenced this pull request Aug 13, 2025
…he#2143)

<!--
Thanks for opening a pull request!
-->

<!-- Closes apache#2150 -->

# Rationale for this change

- Consolidates snapshot expiration functionality from the standalone
`ExpireSnapshots` class into the `MaintenanceTable` class for a unified
maintenance API.
- Resolves planned work left over from apache#1880, and closes
apache#2142
- Achieves feature and API parity with the Java implementation for
snapshot retention and table maintenance.

# Features & Enhancements

- Introduces `table.maintenance.expire_snapshots()` as the unified entry
point for snapshot expiration and future maintenance operations.
- Retains the existing `ExpireSnapshots` implementation internally. The
`expire_snapshots()` method on `MaintenanceTable` now returns an
`ExpireSnapshots` object, preserving transaction semantics and
supporting context manager usage:
  ```python
  with table.maintenance.expire_snapshots() as expire_snapshots:
      expire_snapshots.by_id(1)
      expire_snapshots.by_id(2)
  ```
- Focuses this PR on refactoring and documentation improvements, while
maintaining compatibility with the prior `ExpireSnapshots` interface.
- Sets a foundation for future expansion of the `MaintenanceTable`
abstraction to encapsulate additional maintenance operations.


# Bug Fixes & Cleanups

- **ManageSnapshots Cleanup
([apache#2151](apache#2151
- Removes an unrelated instance variable from the `ManageSnapshots`
class, aligning with the Java reference implementation.

# Testing & Documentation

- **Testing:**
  - Tested the new API interface including:
    - Expiration by ID 
    - Protection of branch/tag snapshots
- **Documentation:**
  - Added and updated documentation to describe:
    - API usage examples

Preview:
<img width="1686" height="1015" alt="Screenshot 2025-08-11 at 1 37
04 PM"
src="https://github.com/user-attachments/assets/f469f3fc-b4b1-4ec9-b1ca-b9185e22643e"
/>


# Are these changes tested?

Yes. All changes are tested.~, with this PR predicated on the final
changes from apache#1200.~ This work builds on the framework introduced by
@jayceslesar in apache#1200 for the `MaintenanceTable`.

# Are there any user-facing changes?

---

**Closes:**  
- Closes apache#2151
- Closes apache#2142

---------

Co-authored-by: Fokko Driesprong <[email protected]>
Co-authored-by: Kevin Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove unrelated instance variable from the ManageSnapshots class refactor: consolidate snapshot expiration into MaintenanceTable

5 participants