-
Notifications
You must be signed in to change notification settings - Fork 349
Fix: Group transaction changes by table #3360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3035e74 to
2380846
Compare
2380846 to
e0f05c0
Compare
|
nit: |
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix. Thanks, @singhpk234 !
Just a couple of minor comments :)
| if (!updatedMetadata.changes().isEmpty()) { | ||
| tableOps.commit(currentMetadata, updatedMetadata); | ||
| // Process each table's changes in order | ||
| changesByTable.forEach( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code logic looks correct to me. I think it's a worthy change to merge in its own right.
However, as for the issue discussed in #3352 (comment) , I think this fix is effective, but it's not apparent that it will work correctly.
The basic problem is that Persistence is called with multiple entity objects for the same ID. This means that the updateEntitiesPropertiesIfNotChanged call on line must contain at most one entity per ID. This may or may not be true depending on what the catalog forwards to transactionMetaStoreManager for each commit on line 1108.
I do believe that one tableOps.commit() will result in one entity update, so it may be sufficient to just add a comment about that around line 1113. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, logic also looks correct to me. +1 to adding a comment on the subtlety though that we're coalescing all the updates for a given table into a single Polaris entity update, which is a slightly different behavior than if the caller expected the various UpdateTableRequests in this commitTransaction to really behave as if they were each applied independently (but as if under a lock).
Note that this issue was a known limitation, and referenced in a TODO in TransactionWorkspaceMetaStoreManager:
Line 84 in 8abf19a
| // TODO: If we want to support the semantic of opening a transaction in which multiple |
// TODO: If we want to support the semantic of opening a transaction in which multiple
// reads and writes occur on the same entities, where the reads are expected to see the writes
// within the transaction workspace that haven't actually been committed, we can augment this
// class by allowing these pendingUpdates to represent the latest state of the entity if we
// also increment entityVersion. We'd need to store both a "latest view" of all updated entities
// to serve reads within the same transaction while also storing the ordered list of
// pendingUpdates that ultimately need to be applied in order within the real MetaStoreManager.
The alternative "fix" described there that is more general but more complex and probably has pitfalls is to really queue up the sequential mutations per entity in that "uncommitted persistence layer".
The main implications would be that if we plug into the MetaStoreManager layer, we can intercept but inherit other relevant hooks, such as generating events, having entityVersion increments directly match actual update requests, etc.
But I'm in favor of this more targeted change-coalescing fix here for now. We could either update/remove the TODO in TransactionWorkspaceMetaStoreManager and/or leave a comment in the code here referencing the other approach so any future changes to the way we handle these can more easily sort through it.
| throw new BadRequestException( | ||
| "Unsupported operation: commitTranaction with updateForStagedCreate: %s", change); | ||
| } | ||
| changesByTable.computeIfAbsent(change.identifier(), k -> new ArrayList<>()).add(change); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check and throw in case the table exists already? It'd also be nice to have a test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the Iceberg REST Catalog spec allow more than one table update in one commitTransaction operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec doesn't mention the uniqueness of the table identifier,
polaris/spec/iceberg-rest-catalog-open-api.yaml
Line 3499 in 70ad92f
| CommitTransactionRequest: |
Within one CommitTableRequest, multiple updates are allowed,
polaris/spec/iceberg-rest-catalog-open-api.yaml
Line 3478 in 70ad92f
| type: array |
With that, I think it's undefined behavior whether the same table identifier could duplicate. We could explicitly disable it in Polaris though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not disable this in Polaris unless the spec explicitly flagged it as a disallowed use case (which it did not).
In lieu of an explicit spec, applying multiple updates in sequence is a fairly straight-forward operation since each update is self-contained and will be validated against the current metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM.
| if (!updatedMetadata.changes().isEmpty()) { | ||
| tableOps.commit(currentMetadata, updatedMetadata); | ||
| // Process each table's changes in order | ||
| changesByTable.forEach( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, logic also looks correct to me. +1 to adding a comment on the subtlety though that we're coalescing all the updates for a given table into a single Polaris entity update, which is a slightly different behavior than if the caller expected the various UpdateTableRequests in this commitTransaction to really behave as if they were each applied independently (but as if under a lock).
Note that this issue was a known limitation, and referenced in a TODO in TransactionWorkspaceMetaStoreManager:
Line 84 in 8abf19a
| // TODO: If we want to support the semantic of opening a transaction in which multiple |
// TODO: If we want to support the semantic of opening a transaction in which multiple
// reads and writes occur on the same entities, where the reads are expected to see the writes
// within the transaction workspace that haven't actually been committed, we can augment this
// class by allowing these pendingUpdates to represent the latest state of the entity if we
// also increment entityVersion. We'd need to store both a "latest view" of all updated entities
// to serve reads within the same transaction while also storing the ordered list of
// pendingUpdates that ultimately need to be applied in order within the real MetaStoreManager.
The alternative "fix" described there that is more general but more complex and probably has pitfalls is to really queue up the sequential mutations per entity in that "uncommitted persistence layer".
The main implications would be that if we plug into the MetaStoreManager layer, we can intercept but inherit other relevant hooks, such as generating events, having entityVersion increments directly match actual update requests, etc.
But I'm in favor of this more targeted change-coalescing fix here for now. We could either update/remove the TODO in TransactionWorkspaceMetaStoreManager and/or leave a comment in the code here referencing the other approach so any future changes to the way we handle these can more easily sort through it.
| } | ||
| }); | ||
|
|
||
| // Apply updates to builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a // TODO to refactor this to better share/reconcile with the update-application logic in CatalogHandlerUtils (I understand this divergence was already latent and not introduced by this PR, but as the update-application logic grows in complexity here it's going to start getting a lot worse).
adnanhemani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome change!
Problem
. Groups ALL changes by table- even if they appear randomly in the input like [A1, B1, A2, C1, A3] → groups to {A:
[A1, A2, A3], B: [B1], C: [C1]}
2. For each table, processes changes sequentially :
- Validate R1 against base metadata → Apply U1 → update currentMetadata
- Validate R2 against updated metadata → Apply U2 → update currentMetadata
- Validate R3 against updated metadata → Apply U3 → update currentMetadata
3. Single commit per table prevents duplicate entity IDs in pendingUpdates
This ensures:
related discussion: #3352 (comment)
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)