Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept Type Widening RFC #4094

Merged
merged 4 commits into from
Mar 10, 2025
Merged

Conversation

johanl-db
Copy link
Collaborator

@johanl-db johanl-db commented Jan 28, 2025

Description

The type widening table feature has a stable implementation in delta-spark since Delta 3.2 an in Kernel Java/Rust.

This PR proposes to accept the RFC specification of the feature and add it to the Delta protocol.

Type Widening feature request: #2623
Related: transitioning table feature from preview to stable: #4094 (comment)

@johanl-db johanl-db self-assigned this Feb 17, 2025
Comment on lines +1404 to +1405
- `Byte`, `Short` or `Int` -> `Decimal(10 + k1, k2)` where `k1 >= k2 >= 0`.
- `Long` -> `Decimal(20 + k1, k2)` where `k1 >= k2 >= 0`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember seeing this 10 + k1 bit before?
Whenever it was added, I think it invalidated the k1 >= k2 inequality?
(maybe just need 10 + k1 >= k2 >= 0?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do need k1 >= k2 here, otherwise Decimal(10, 5) would be an allowed promotion, even though it can't store all possible int values: 100000 -> 100000.00000 doesn't fit already

A slightly different way of saying it is that int->Decimal(10,0) is a supported type widening. Then you can further widen to larger decimals Decimal(p, s) -> Decimal(p + k1, s + k2)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't catch that detail before, thanks for clarifying.

## Reader Requirements for Type Widening
When Type Widening is supported (when the `readerFeatures` field of a table's `protocol` action contains `typeWidening`), then:
- Readers must allow reading data files written before the table underwent any supported type change, and must convert such values to the current, wider type.
- Readers must validate that they support all type changes in the `delta.typeChanges` field in the table schema for the table version they are reading and fail when finding any unsupported type change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the impact on other table features that have cross table feature dependencies. For example, IcebergCompatV1/V2. Since Iceberg's Type widening is only going to be in IcebergV3+, Delta's IcebergCompatV1/V2 will not be compatible to type widening feature. I think we should update their protocols to clarify this.

what do you think @scovich @lzlfred ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, what we're currently doing in Delta spark is:

  1. Block applying type changes unsupported by Iceberg if Uniform with Iceberg compat is enabled.
  2. Block enabling Uniform with Iceberg compat if type changes not supported by Iceberg have been applied.

Users can always DROP the type widening table feature to make the table compatible with Iceberg again, at the cost of having to rewrite the data.

This would translate into the following requirements:

## Writer Requirements for Type Widening
- Writers must reject applying type changes not supported by [Iceberg V2](https://iceberg.apache.org/spec/#schema-evolution)
  when either the `IcebergCompatV1` or `IcebergCompatV2` table feature is supported:
  - `Byte`, `Short` or `Int` -> `Double`
  - `Date`  -> `Timestamp without timezone`
  - Decimal scale increase
  - `Byte`, `Short`, `Int` or `Long` -> `Decimal`
## Writer Requirements for IcebergCompatV1 (/IcebergCompatV2)
- If Type Widening is supported, require that no type changes not supported by
  [Iceberg V2](https://iceberg.apache.org/spec/#schema-evolution) were applied on the table,
  based on the [Type Change Metadata](https://github.com/delta-io/delta/pull/4094/files#diff-62d56fa1c25dda8d49ccdbf9ea31df4576f84625706d54613dff0acc8f974d97R1415)
  recorded in the table schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks better. but i would like iceberg compat experts to sign off on this. @fred?

furthermore, should we start writing down in the protocol about how to drop a feature? what do writers need to do drop a feature?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the specs to call out the cross-requirements between type widening and uniform + add requirements for dropping the type widening table feature

Copy link
Contributor

@tdas tdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my main comment is about iceberg compatibility

@johanl-db johanl-db requested review from tdas, lzlfred and scovich March 5, 2025 09:48
@johanl-db
Copy link
Collaborator Author

I updated the spec to add:

  • cross-requirements with Iceberg Compat V1/V2 for type changes not supported by Iceberg
  • Requirements when dropping type widening

@tdas, @lzlfred, @scovich PTAL

Copy link
Contributor

@tdas tdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for doing this! This is a great addition to the Delta protocol.

Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch about the interaction w/ Iceberg compat. LGTM now.

@tdas tdas merged commit 181226a into delta-io:master Mar 10, 2025
18 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants