-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPKG: setting fid when writing file via arrow gives an error #11527
Comments
There was a stupidly embarassing error logic in the GPKG driver that is fixed per #11528 , but your issue is different. It can be solved independently from that bugfix by just changing your code snippet to exclude the creation of an attribute field corresponding to the FID column of the source. So
Dealing with FIDs in OGR has always be a bit tricky because the OGR abstraction give them a special status while database-like format make them a regular field. That isn't improved by Arrow not having a FID concept itself and thus needing to sometimes synthetize a column. I suspect the WriteArrowBatch() logic could perhaps be enhanced to support both a destination layer that has a named FID column and a regular field of the same name, but not totally sure we want to make that work because even if the GPKG driver would work that could cause issues in other drivers. So better not call CreateFieldFromArrowSchema() with a name that matches the name of the FID column. |
How does the destination layer know that that column is the FID column? (because the arrow batch you are writing still has that field in the schema) |
Ah, but (for pyogrio's use of the arrow writer, I suppose we should enable the user to specify the name for the FID column then when writing) |
There's none. Some drivers (like GPKG) offer the capacity to customize its name by offering a |
@rouault thanks for the advice... The code in pyogrio did the same thing as the sample code above: add the "fid" column as an ordinary column as well... triggering the issue fixed in #11528, so that is being fixed now in pyogrio (geopandas/pyogrio#511). Following up on that I noticed that the "fid" column detection is case insensitive, which is fine. However the issue fixed in #11528 doesn't seem to be triggered when the column to use the fids from is called "FID" instead of "fid"... which seems a bit odd? |
@rouault hmm... I was too fast... the behaviour seems to be still slightly different... I just made the check that an "fid" column should not be added as an ordinary column case insensitive, but this leads to another error: So it seems that the primary discovery of the fid column is case sensitive, but that the discovery of "ordinary" columns that could be used as FID is not... or something like that? I tried to make an overview of the different situations and the behaviour as I interprete it:
|
to be honest, I'm lost :-) There might be things fixable, and others not. Not sure I'm in the mood of investigating that... I do remember I struggled mapping Arrow to OGR related to FID and yes things are going to be a bit messy. I'd wishing someone could take the lead in GDAL for "Arrow related activities"... |
:-). No problem. It confirms what I thought... I had been reading some code, but didn't (immediately) find the case-sensitive comparison. I found where the
And... I also found the code that seems to set the FID to that column if there is no FID column found, here:
So putting everything together it explains the behaviour I'm seeing: the "real" FID column detection is case sensitive, but the "FID regular column" detection is case insensitive, and if only the second type is found it is "recuperated". So, no worries... I'm already happy that there is a logic and that the behaviour can be explained :-). |
Is there anything left actionable in this ticket now that #11532 was committed ? |
No, it's OK as it is... Thanks! |
What is the bug?
Setting an FID explicitly and writing the row to e.g. GPKG gives an error:
Inconsistent values of FID and field of same name
.Steps to reproduce the issue
Script to reproduce the error:
Versions and provenance
Tested on Windows 11, using gdal 3.10.0 installed via conda-forge. Can also be reproduced in e.g. GDAL 3.8.5 and 3.9.2 (geopandas/pyogrio#511).
Additional context
No response
The text was updated successfully, but these errors were encountered: