GH-48254: [Python][Parquet] Support extension types in read_schema#48255
GH-48254: [Python][Parquet] Support extension types in read_schema#48255Kuinox wants to merge 1 commit intoapache:mainfrom
Conversation
|
|
ae7d919 to
47fe873
Compare
2fcb4b7 to
820ae83
Compare
e16f96f to
a144bc4
Compare
|
I had issues running the tests on my machines (it was indicated green), I now have a non windows machine, so i'll try on it. |
|
@github-actions crossbow submit -g python |
|
Revision: 966df38 Submitted crossbow builds: ursacomputing/crossbow @ actions-e7fd264d23 |
|
Are the error expected? The build errors doesn't seems related to my change. |
|
Some of them are, but not that many. Could you first try to rebase again please? |
|
@github-actions crossbow submit -g python |
|
Revision: 808df3d Submitted crossbow builds: ursacomputing/crossbow @ actions-8aeeb0e39d |
| data = [ | ||
| b'\xe4`\xf9p\x83QGN\xac\x7f\xa4g>K\xa8\xcb', | ||
| b'\x1et\x14\x95\xee\xd5C\xea\x9b\xd7s\xdc\x91BK\xaf', |
There was a problem hiding this comment.
nit: can we add a comment on what is this / how was it generated?
If we ever want to change that or fix a bug in the future it could be useful.
There was a problem hiding this comment.
thoses are two uuid, i'll add comments
|
|
||
| file_path = tmp_path / "uuid.parquet" | ||
| file_path_str = str(file_path) | ||
| pq.write_table(table, file_path_str, store_schema=False) |
There was a problem hiding this comment.
just curious, is store_schema=False relevant?
There was a problem hiding this comment.
it was 6 months ago so I'm only guessing now:
I remember that there was differents behavior depending if arrow loaded it's stored schema or not.
I don't remember if it was needed here, but store_schema=False would allow to be sure that an uuid logical type is detected as is without arrow getting the information from it's own schema.
I can confirm it if you want
Rationale for this change
pq.read_schema drops extension types (UUID comes back as fixed_size_binary[16]), while ParquetFile.schema_arrow and read_table preserve them. Schema inspection via metadata should match table/extension behavior.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
Notes: