-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accept Type Widening RFC #4094
Accept Type Widening RFC #4094
Conversation
- `Byte`, `Short` or `Int` -> `Decimal(10 + k1, k2)` where `k1 >= k2 >= 0`. | ||
- `Long` -> `Decimal(20 + k1, k2)` where `k1 >= k2 >= 0`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember seeing this 10 + k1
bit before?
Whenever it was added, I think it invalidated the k1 >= k2
inequality?
(maybe just need 10 + k1 >= k2 >= 0
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do need k1 >= k2
here, otherwise Decimal(10, 5)
would be an allowed promotion, even though it can't store all possible int
values: 100000 -> 100000.00000 doesn't fit already
A slightly different way of saying it is that int
->Decimal(10,0)
is a supported type widening. Then you can further widen to larger decimals Decimal(p, s)
-> Decimal(p + k1, s + k2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't catch that detail before, thanks for clarifying.
## Reader Requirements for Type Widening | ||
When Type Widening is supported (when the `readerFeatures` field of a table's `protocol` action contains `typeWidening`), then: | ||
- Readers must allow reading data files written before the table underwent any supported type change, and must convert such values to the current, wider type. | ||
- Readers must validate that they support all type changes in the `delta.typeChanges` field in the table schema for the table version they are reading and fail when finding any unsupported type change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the impact on other table features that have cross table feature dependencies. For example, IcebergCompatV1/V2. Since Iceberg's Type widening is only going to be in IcebergV3+, Delta's IcebergCompatV1/V2 will not be compatible to type widening feature. I think we should update their protocols to clarify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, what we're currently doing in Delta spark is:
- Block applying type changes unsupported by Iceberg if Uniform with Iceberg compat is enabled.
- Block enabling Uniform with Iceberg compat if type changes not supported by Iceberg have been applied.
Users can always DROP the type widening table feature to make the table compatible with Iceberg again, at the cost of having to rewrite the data.
This would translate into the following requirements:
## Writer Requirements for Type Widening
- Writers must reject applying type changes not supported by [Iceberg V2](https://iceberg.apache.org/spec/#schema-evolution)
when either the `IcebergCompatV1` or `IcebergCompatV2` table feature is supported:
- `Byte`, `Short` or `Int` -> `Double`
- `Date` -> `Timestamp without timezone`
- Decimal scale increase
- `Byte`, `Short`, `Int` or `Long` -> `Decimal`
## Writer Requirements for IcebergCompatV1 (/IcebergCompatV2)
- If Type Widening is supported, require that no type changes not supported by
[Iceberg V2](https://iceberg.apache.org/spec/#schema-evolution) were applied on the table,
based on the [Type Change Metadata](https://github.com/delta-io/delta/pull/4094/files#diff-62d56fa1c25dda8d49ccdbf9ea31df4576f84625706d54613dff0acc8f974d97R1415)
recorded in the table schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks better. but i would like iceberg compat experts to sign off on this. @fred?
furthermore, should we start writing down in the protocol about how to drop a feature? what do writers need to do drop a feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the specs to call out the cross-requirements between type widening and uniform + add requirements for dropping the type widening table feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my main comment is about iceberg compatibility
…pping Type Widening
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for doing this! This is a great addition to the Delta protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch about the interaction w/ Iceberg compat. LGTM now.
Description
The type widening table feature has a stable implementation in delta-spark since Delta 3.2 an in Kernel Java/Rust.
This PR proposes to accept the RFC specification of the feature and add it to the Delta protocol.
Type Widening feature request: #2623
Related: transitioning table feature from preview to stable: #4094 (comment)