Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert numpy arrays to ASCII strings #956

Merged
merged 11 commits into from
Oct 31, 2023
Merged

Convert numpy arrays to ASCII strings #956

merged 11 commits into from
Oct 31, 2023

Conversation

brryan
Copy link
Collaborator

@brryan brryan commented Oct 12, 2023

PR Summary

Newer versions of h5py appear to convert string values in HDF5 attributes to numpy arrays of ASCII values (e.g. [10 200 45 109 ...]). This PR returns these values to strings.

Currently this PR acts on all numpy arrays that appear as values in the Params attribute; I think this is OK because Params doesn't support vector input but it is worth thinking about in case there is a better way to deal with this.

PR Checklist

  • Code passes cpplint
  • New features are documented.
  • Adds a test for any bugs fixed. Adds tests for new features.
  • Code is formatted
  • Changes are summarized in CHANGELOG.md
  • CI has been triggered on Darwin for performance regression tests.
  • Docs build
  • (@lanl.gov employees) Update copyright on changed files

@brryan brryan requested review from Yurlungur and pgrete October 12, 2023 16:28
@Yurlungur
Copy link
Collaborator

Params in C++ land at least does support vector I.O. I don't know what the right check is in python land, though.

@Yurlungur Yurlungur closed this Oct 12, 2023
@Yurlungur
Copy link
Collaborator

Yurlungur commented Oct 12, 2023

oops did not mean to close. Sorry.

@Yurlungur Yurlungur reopened this Oct 12, 2023
@pgrete
Copy link
Collaborator

pgrete commented Oct 18, 2023

Is this actually an h5py issue?
e.g., when I use h5dump on some dataset, I see

      ATTRIBUTE "Coordinates" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "UniformCartesian"
         }

which, in the code, is dumped as

    HDF5WriteAttribute("Coordinates", std::string(first_block.coords.Name()).c_str(),
                       info_group);

where I get

      ATTRIBUTE "x_var_name" {
         DATATYPE  H5T_STD_I8LE
         DATASPACE  SIMPLE { ( 12 ) / ( 12 ) }
         DATA {
         (0): 72, 73, 83, 84, 95, 67, 79, 79, 82, 68, 95, 82
         }
      }

for

      HDF5WriteAttribute("x_var_name", hist.x_var_name, hist_group);

So the "char/int" interpretation stems from the missing .c_str() in Parthenon.

In case, it's not an issue on how the Attribute is written, does file["my_group"].attrs["my_attribute"].tobytes().decode("ascii") also work (rather than the for loop)?

@pgrete
Copy link
Collaborator

pgrete commented Oct 18, 2023

Actually, we could consider adding an additional overload for HDF5WriteAttribute that forwards the input as c_str.

@Yurlungur
Copy link
Collaborator

If the fix is in parthenon, I would prefer to make it there, as that seems like it is more likely to be maintainable, rather than a conversion on the python side.

@brryan
Copy link
Collaborator Author

brryan commented Oct 25, 2023

Thanks for noticing the value is already mangled as seen by h5dump @pgrete -- you are right this is not an h5py issue. I took care of this with a one-line fix using the existing std::string function overload (just calling .c_str() on that string and calling HDF5WriteAttribute again rather than interpreting as a vector), and reverted my phdf modification. For a string parameter in the advection example, Parthenon was producing

   ATTRIBUTE "advection_package/profile" {
      DATATYPE  H5T_STD_U8LE
      DATASPACE  SIMPLE { ( 11 ) / ( 11 ) }
      DATA {
      (0): 104, 97, 114, 100, 95, 115, 112, 104, 101, 114, 101
      }
   }

but with this change we now get

  ATTRIBUTE "advection_package/profile" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "hard_sphere"
      }
   }

Similarly, phdf.Params['advection_package/profile'] now gives hard_sphere

@brryan
Copy link
Collaborator Author

brryan commented Oct 25, 2023

I also just checked and the code correctly recovers hard_sphere inside the advection_package Params after restart.

Copy link
Collaborator

@Yurlungur Yurlungur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @brryan this is a clean fix. Probably my bad for the bug in the first place... I was the last one to touch the attributes stuf...

@Yurlungur Yurlungur enabled auto-merge October 25, 2023 22:07
Copy link
Collaborator

@pgrete pgrete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, this definitely looks like the right fix.

@brryan brryan disabled auto-merge October 31, 2023 18:38
@brryan brryan enabled auto-merge (squash) October 31, 2023 18:38
@brryan brryan merged commit fb17370 into develop Oct 31, 2023
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants