-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Change Delete granularity to file for new tables #11478
base: main
Are you sure you want to change the base?
Core: Change Delete granularity to file for new tables #11478
Conversation
69a8821
to
e4e1f0f
Compare
95301d5
to
66f8307
Compare
@@ -90,6 +90,8 @@ private static Map<String, String> persistedProperties(Map<String, String> rawPr | |||
persistedProperties.put( | |||
TableProperties.PARQUET_COMPRESSION, | |||
TableProperties.PARQUET_COMPRESSION_DEFAULT_SINCE_1_4_0); | |||
persistedProperties.put( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By adding this, we need to update a few tests (like Hive), as we see now "write.delete.granularity"="file"
in the metadata persistence.
I propose to update the tests.
Maybe worth to add a test to check how to read "previous" files which don't contain the property ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we'll definitely need to update the failing Hive tests in their current form since it looks like they currently have expectations on the persisted properties.
Maybe worth to add a test to check how to read "previous" files which don't contain the property ?
In terms of general metadata file reading though, the core library will just read the metadata file with or without the property (since the properties is just a string-string map). In terms of how engine integrations handle the missing property, at the moment only Spark really leverages this property with a default option in case it's not defined, and we actually already have a test for verifying this default handling in case it's missing.
I'll double check the other integrations though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a simple test to TestTableMetadata
to make sure this is properly set on new tables and not set on existing ones?
66f8307
to
84e133d
Compare
fd8b4bf
to
07c0b77
Compare
07c0b77
to
b484a7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are tests in the Spark integration that needed to be updated since they had expectations on number of output files which are now different. My general approach was to just set partition scoped deletes in the table setup if there's co verage of the file scoped case for that test class or if a given test is only useful in the context of partition scoped deletes. For rewrite procedure/action, the action tests already covers both file and partition scoped cases. The procedure also has expectations on number of files, but in the setup for the procedure test I setup partition scoped deletes since the underlying action already tests both. This minimizes how many tests need to be unnecessarily rewritten with new table layouts or expectations on output files.
TableProperties.DELETE_GRANULARITY, | ||
DeleteGranularity.PARTITION.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just the default setup for a table in this test, we do test file scoped deletes as well for this action in testFileGranularity
in this test class.
@@ -49,7 +49,7 @@ private void createTable(boolean partitioned) throws Exception { | |||
String partitionStmt = partitioned ? "PARTITIONED BY (id)" : ""; | |||
sql( | |||
"CREATE TABLE %s (id bigint, data string) USING iceberg %s TBLPROPERTIES" | |||
+ "('format-version'='2', 'write.delete.mode'='merge-on-read')", | |||
+ "('format-version'='2', 'write.delete.mode'='merge-on-read', 'write.delete.granularity'='partitioned')", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below, the tests for the underlying action already test both partition + file granularity so we have coverage of both cases. I don't know how much value it would add to rewrite all the tests here which have expectations based on partition granularity to have expectations based on file granularity or have procedure level tests for both.
@@ -154,7 +155,7 @@ public void testDeleteWithVectorizedReads() throws NoSuchTableException { | |||
} | |||
|
|||
@Test | |||
public void testCoalesceDelete() throws Exception { | |||
public void testCoalesceDeleteWithPartitionGranularity() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this test to use partition scoped deletes since I think the original intent of the test was to verify the output files when the shuffle blocks are small and coalesced into a single task (with an unpartitioned table) as part of AQE.
Without making this test specific to partition scoped deletes, we wouldn't really be testing how AQE coalescing to a single task is affecting the number of output files, which would be undifferentiated from the other tests where we already cover file scoped deletes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me.
@amogh-jahagirdar I don't see #11273 being back-ported to Spark 3.3, 3.4 yet. Shall we skip changes to Spark 3.3, 3.4 until that is done? Meanwhile, do we have some benchmark numbers? |
@@ -90,6 +90,8 @@ private static Map<String, String> persistedProperties(Map<String, String> rawPr | |||
persistedProperties.put( | |||
TableProperties.PARQUET_COMPRESSION, | |||
TableProperties.PARQUET_COMPRESSION_DEFAULT_SINCE_1_4_0); | |||
persistedProperties.put( | |||
TableProperties.DELETE_GRANULARITY, TableProperties.DELETE_GRANULARITY_DEFAULT_SINCE_1_8_0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matches what we did for zstd in Parquet. I think it should be fine as is.
An alternative is to change the default in our Spark conf classes, but it will be hard to differentiate between new and existing tables. Switching to file-scoped deletes for existing workloads may be a bigger disruption.
@manuzhang Definitely agree that we shouldn't merge this change to 3.3/3.4 until the maintenance piece is backported, I'm working on the backport. Though I think we can leave this PR open for discussion, and I also wanted to raise this to see which tests would need to be updated from CI results (and I wanted to get some clarity if 3.3/3.4 tests were different somehow or not). What I'd propose is we backport the sync maintenance to 3.3/3.4 since I think we'll need that anyways, and then if there's consensus here/mailing list to change the default table property then we could just update all the spark tests in one go. |
I'm in favor of doing a normal backport so that all Spark versions get the same improvement |
This changes the default delete granularity for spark writes to file scoped. Since synchronous maintenance from #11273 was merged, we should now be able to get the best of both targeted reads and less files on disk. This makes the file scoped deletes more compelling as a default.