Skip to content

Conversation

@lishuxu
Copy link
Contributor

@lishuxu lishuxu commented Nov 15, 2025

feat: transactional UpdateProperties method support

Table/Catalog produce a Transaction (e.g., NewTransaction or staged create/replace).
Clients build PendingUpdate actions (e.g., UpdateProperties, AppendFiles), each validating via Apply() (conflict checks, reserved keys, format-version parsing, schema/metrics validation).
Transaction::CommitTransaction() is expected to gather pending updates and call the catalog’s UpdateTable (with derived TableUpdate objects and requirements) in one atomic commit; on success, metadata refreshes; on failure, errors propagate

Copy link
Collaborator

@zhjwpku zhjwpku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with the rest catalog open api, so +1, thanks.

/// \param schema_ids the schema ids to remove
/// \return a new RemoveSchemas
virtual std::shared_ptr<RemoveSchemas> RemoveSchemas(
const std::vector<int32_t>& schema_ids) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const std::vector<int32_t>& schema_ids) = 0;
std::span<const int32_t> schema_ids) = 0;

Should we use span to accept broader range of input types for here and below?

@lishuxu lishuxu force-pushed the feature/transaction-api branch from 23bf661 to 0c1f156 Compare November 29, 2025 13:50
@lishuxu lishuxu changed the title feat: add table-update api to transaction interface feat: transactional UpdateProperties method support Nov 29, 2025
@wgtmac
Copy link
Member

wgtmac commented Dec 3, 2025

Could you rebase to resolve conflict?

@lishuxu lishuxu force-pushed the feature/transaction-api branch from aa4cadb to 14bd4f8 Compare December 6, 2025 13:38
@lishuxu
Copy link
Contributor Author

lishuxu commented Dec 8, 2025

Could you rebase to resolve conflict?

done

const TableIdentifier& identifier,
const std::vector<std::unique_ptr<TableRequirement>>& requirements,
const std::vector<std::unique_ptr<TableUpdate>>& updates) = 0;
std::vector<std::unique_ptr<TableRequirement>> requirements,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we revert this change? If later we support retry on failed commits, these requirements and updates cannot be reused since they are moved away.

/// \brief Set whether the last operation in a transaction has been committed
///
/// \param committed true if the last operation has been committed, false otherwise
virtual void SetLastOperationCommitted(bool committed) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is only used by transaction catalog so it should not appear here.


std::unique_ptr<Transaction> Table::NewTransaction() const {
throw NotImplemented("Table::NewTransaction is not implemented");
return std::make_unique<BaseTransaction>(shared_from_this(), catalog_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend directly pass Table* to it. Similarly, for catalog_ we can use catalog_.get() to just use pointer. The created transaction object should not outlive the table and catalog objects. It is unnecessary to increase the reference counter of them.

I'm open to keep the current design. WDYT @HuaHuaY?

Copy link
Contributor

@HuaHuaY HuaHuaY Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the table‘s content changes, or we drop a table, what should the executing transaction hold?

Comment on lines +41 to +43
std::unique_ptr<::iceberg::UpdateProperties> UpdateProperties() override;

std::unique_ptr<AppendFiles> NewAppend() override;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::unique_ptr<::iceberg::UpdateProperties> UpdateProperties() override;
std::unique_ptr<AppendFiles> NewAppend() override;
std::shared_ptr<UpdateProperties> NewUpdateProperties() override;
std::shared_ptr<AppendFiles> NewAppend() override;

It seems that returning shared_ptr will make it easier to collect and transform the update internally. Adding New prefix also can avoid iceberg:: prefix in the return type.

namespace iceberg {

/// \brief Base class for transaction implementations
class ICEBERG_EXPORT BaseTransaction : public Transaction {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If BaseTransaction is the only subclass, we can just implement all features to Transaction just like Table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do we want to introduce TransactionType from the Java impl as well? If would be good if we can add this to make it clear.

auto update = CheckAndCreateUpdate<::iceberg::UpdateProperties>(
table_->name(), catalog_, CurrentMetadata());
if (!update.has_value()) {
ERROR_TO_EXCEPTION(update.error());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot throw. Perhaps we should also use Result wrapper for these update return types.

* For mutating calls such as UpdateTable or HasLastOperationCommitted, it delegates back
* to the owning BaseTransaction so staged updates remain private until commit.
*/
class ICEBERG_EXPORT TransactionCatalog : public Catalog {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this class to base_transaction.cc?


// update table to the new version
if (updated_table) {
table_ = std::shared_ptr<Table>(std::move(updated_table));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this table_ after commit has been done? It seems useless?

context_.pending_updates.emplace_back(std::move(update));
}

return std::make_unique<Table>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to return a table?

return {};
}

auto pending_requirements = ConsumePendingRequirements();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we need to support retry on failed commits? In the current approach, these two lists are gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants