Refactor `Metadata` in `Transaction` #1903

Fokko · 2025-04-09T21:54:08Z

Rationale for this change

Today, we have a copy of the TableMetadata on the Table and the Transaction. This PR changes that logic to re-use the one on the table, and add the changes to the one on the Transaction.

This also allows us to stack changes, for example, to first change a schema, and then write data with the new schema right away.

Also a prerequisite for #1772

Are these changes tested?

Includes a new test :)

Are there any user-facing changes?

Today, we have a copy of the `TableMetadata` on the `Table` and the `Transaction`. This PR changes that logic to re-use the one on the table, and add the changes to the one on the `Transaction`. This also allows us to stack changes, for example, to first change a schema, and then write data with the new schema right away. Also a prerequisite for apache#1772

smaheshwar-pltr · 2025-04-11T00:48:16Z

pyiceberg/table/__init__.py

+    @property
+    def table_metadata(self) -> TableMetadata:
+        return update_table_metadata(self._table.metadata, self._updates)


Curious about performance implications of this change for when metadata gets large - with it, and assuming no autocommit, it seems that that each use of table_metadata will start with the original metadata of the table, not the current one of the transaction (which is what was done before), and apply all updates to it, not just the most recent one, copying it every time via model_copy?

I think what was here before, self.table_metadata = update_table_metadata(self.table_metadata, updates), only applies just the necessary updates within _apply, and stores results in a field along the way to continually update just the current transactional metadata. Because of that, continually using table_metadata, either via PyIceberg code or user code seemed cheap before but maybe no longer.

(Probably missing something though because I don't follow the change in behaviour caused by this change)

Yes, your assessment is correct. The main issue that it tackles is that we remove the Metadata state on the Transaction layer. When we start implementing optimistic concurrency, before applying the commit, we could refresh the underlying table when we do a retry.

I think the code will have pretty decent performance since it will use Pydantic under the hood which delegates everything to their Rust layer, and also the singledispatch logic is also pretty performant.

Thanks for clarifying and updating the test - sounds good!

tests/integration/test_writes/test_writes.py

…tor-transaction

sungwy

I think this is a great idea @Fokko and I agree that the performance impact should be minimal. 💯

Fokko · 2025-04-18T10:19:14Z

Awesome, thanks for the review @sungwy and @smaheshwar-pltr Let's move this forward 👍

Fokko force-pushed the fd-refactor-transaction branch from 8642065 to 8fe2ecd Compare April 9, 2025 21:55

smaheshwar-pltr reviewed Apr 11, 2025

View reviewed changes

tests/integration/test_writes/test_writes.py Show resolved Hide resolved

Fokko added 2 commits April 14, 2025 21:32

Better test

fd9f0d7

Merge branch 'main' of github.com:apache/iceberg-python into fd-refac…

cb6a05a

…tor-transaction

Fokko mentioned this pull request Apr 17, 2025

Move implementation of upsert from Table to Transaction #1817

Merged

sungwy approved these changes Apr 18, 2025

View reviewed changes

Fokko merged commit 068ee5d into apache:main Apr 18, 2025
7 checks passed

Fokko deleted the fd-refactor-transaction branch April 18, 2025 10:19

Fokko mentioned this pull request Apr 18, 2025

feat: validate snapshot write compatibility #1772

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor `Metadata` in `Transaction` #1903

Refactor `Metadata` in `Transaction` #1903

Uh oh!

Fokko commented Apr 9, 2025 •

edited

Loading

Uh oh!

smaheshwar-pltr Apr 11, 2025

Uh oh!

smaheshwar-pltr Apr 11, 2025

Uh oh!

Fokko Apr 14, 2025

Uh oh!

smaheshwar-pltr Apr 18, 2025

Uh oh!

Uh oh!

sungwy left a comment

Uh oh!

Uh oh!

Fokko commented Apr 18, 2025

Uh oh!

Uh oh!

Refactor Metadata in Transaction #1903

Refactor Metadata in Transaction #1903

Uh oh!

Conversation

Fokko commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

Uh oh!

smaheshwar-pltr Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

smaheshwar-pltr Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

Fokko Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

smaheshwar-pltr Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sungwy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Fokko commented Apr 18, 2025

Uh oh!

Uh oh!

Refactor `Metadata` in `Transaction` #1903

Refactor `Metadata` in `Transaction` #1903

Fokko commented Apr 9, 2025 •

edited

Loading