-
Notifications
You must be signed in to change notification settings - Fork 74
feat: transactional UpdateProperties method support #321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
zhjwpku
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is consistent with the rest catalog open api, so +1, thanks.
src/iceberg/transaction.h
Outdated
| /// \param schema_ids the schema ids to remove | ||
| /// \return a new RemoveSchemas | ||
| virtual std::shared_ptr<RemoveSchemas> RemoveSchemas( | ||
| const std::vector<int32_t>& schema_ids) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const std::vector<int32_t>& schema_ids) = 0; | |
| std::span<const int32_t> schema_ids) = 0; |
Should we use span to accept broader range of input types for here and below?
23bf661 to
0c1f156
Compare
|
Could you rebase to resolve conflict? |
aa4cadb to
14bd4f8
Compare
done |
| const TableIdentifier& identifier, | ||
| const std::vector<std::unique_ptr<TableRequirement>>& requirements, | ||
| const std::vector<std::unique_ptr<TableUpdate>>& updates) = 0; | ||
| std::vector<std::unique_ptr<TableRequirement>> requirements, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we revert this change? If later we support retry on failed commits, these requirements and updates cannot be reused since they are moved away.
| /// \brief Set whether the last operation in a transaction has been committed | ||
| /// | ||
| /// \param committed true if the last operation has been committed, false otherwise | ||
| virtual void SetLastOperationCommitted(bool committed) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is only used by transaction catalog so it should not appear here.
|
|
||
| std::unique_ptr<Transaction> Table::NewTransaction() const { | ||
| throw NotImplemented("Table::NewTransaction is not implemented"); | ||
| return std::make_unique<BaseTransaction>(shared_from_this(), catalog_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend directly pass Table* to it. Similarly, for catalog_ we can use catalog_.get() to just use pointer. The created transaction object should not outlive the table and catalog objects. It is unnecessary to increase the reference counter of them.
I'm open to keep the current design. WDYT @HuaHuaY?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the table‘s content changes, or we drop a table, what should the executing transaction hold?
| std::unique_ptr<::iceberg::UpdateProperties> UpdateProperties() override; | ||
|
|
||
| std::unique_ptr<AppendFiles> NewAppend() override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| std::unique_ptr<::iceberg::UpdateProperties> UpdateProperties() override; | |
| std::unique_ptr<AppendFiles> NewAppend() override; | |
| std::shared_ptr<UpdateProperties> NewUpdateProperties() override; | |
| std::shared_ptr<AppendFiles> NewAppend() override; |
It seems that returning shared_ptr will make it easier to collect and transform the update internally. Adding New prefix also can avoid iceberg:: prefix in the return type.
| namespace iceberg { | ||
|
|
||
| /// \brief Base class for transaction implementations | ||
| class ICEBERG_EXPORT BaseTransaction : public Transaction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If BaseTransaction is the only subclass, we can just implement all features to Transaction just like Table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, do we want to introduce TransactionType from the Java impl as well? If would be good if we can add this to make it clear.
| auto update = CheckAndCreateUpdate<::iceberg::UpdateProperties>( | ||
| table_->name(), catalog_, CurrentMetadata()); | ||
| if (!update.has_value()) { | ||
| ERROR_TO_EXCEPTION(update.error()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot throw. Perhaps we should also use Result wrapper for these update return types.
| * For mutating calls such as UpdateTable or HasLastOperationCommitted, it delegates back | ||
| * to the owning BaseTransaction so staged updates remain private until commit. | ||
| */ | ||
| class ICEBERG_EXPORT TransactionCatalog : public Catalog { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this class to base_transaction.cc?
|
|
||
| // update table to the new version | ||
| if (updated_table) { | ||
| table_ = std::shared_ptr<Table>(std::move(updated_table)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this table_ after commit has been done? It seems useless?
| context_.pending_updates.emplace_back(std::move(update)); | ||
| } | ||
|
|
||
| return std::make_unique<Table>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to return a table?
| return {}; | ||
| } | ||
|
|
||
| auto pending_requirements = ConsumePendingRequirements(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we need to support retry on failed commits? In the current approach, these two lists are gone.
feat: transactional UpdateProperties method support
Table/Catalog produce a Transaction (e.g., NewTransaction or staged create/replace).
Clients build PendingUpdate actions (e.g., UpdateProperties, AppendFiles), each validating via Apply() (conflict checks, reserved keys, format-version parsing, schema/metrics validation).
Transaction::CommitTransaction() is expected to gather pending updates and call the catalog’s UpdateTable (with derived TableUpdate objects and requirements) in one atomic commit; on success, metadata refreshes; on failure, errors propagate