Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROTOCOL] [BUG] Inconsistent behavior for binary type partition value serialization #4189

Open
3 of 8 tasks
felipepessoto opened this issue Feb 26, 2025 · 2 comments
Open
3 of 8 tasks
Labels
bug Something isn't working

Comments

@felipepessoto
Copy link
Contributor

Bug

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Describe the problem

PROTOCOL says: Encoded as a string of escaped binary values. For example, "\u0001\u0002\u0003"

This is not happening for values that can be represented by string:

INSERT INTO TestBinary VALUES (10, X'123456'); -- OK
Image

INSERT INTO TestBinary VALUES (10, CAST('Hello' as BINARY)); // NOT OK
Image

Steps to reproduce

CREATE TABLE TestBinary (id INT, value BINARY) PARTITIONED BY (value);
INSERT INTO TestBinary VALUES (10, CAST('Hello' as BINARY));
INSERT INTO TestBinary VALUES (10, X'123456');

Observed results

Hello

Expected results

\u0048\u0065\u006c\u006c\u006f

Environment information

  • Delta Lake version: 3.2
  • Spark version: 3.5
  • Scala version: 2.12

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.
@felipepessoto felipepessoto added the bug Something isn't working label Feb 26, 2025
@felipepessoto
Copy link
Contributor Author

@vkorukanti @scottsand-db @richardc-db, do you have any insights?

@felipepessoto
Copy link
Contributor Author

Also, do you know why we don't support empty string as partition values?

@marmbrus, @zsxwing might also have some context, https://github.com/delta-io/delta/pull/153/files#r321406842

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant