-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more Pandas-based Checkpointing and Save/Load Functions #16
base: develop
Are you sure you want to change the base?
Conversation
Adds all possible optional dependencies for checkpointing to CI Changes keyword argument parsing for to_hdf and from_hdf to use **kwargs Adds all possible optional dependencies for checkpointing to CI
…g nodes and converting string representations of lists back to lists
…d nodes because of the lack of bytes type in Python 2
…nges merged from develop
Originally from hatchet/hatchet on May 18, 2021 I might wait until hatchet/hatchet#377 is merged before marking this PR ready-for-review. This PR adds some global configuration type data to all the |
Originally from May 22, 2021: Implementation and testing is now complete. This PR depends on hatchet/hatchet#272, so it definitely shouldn't be reviewed or merged until hatchet/hatchet#272 is merged. I also want to integrate hatchet/hatchet#377, but I might do that in a separate PR. |
74d7f3e
to
837e5e3
Compare
b461833
to
48d44ce
Compare
Follow up to hatchet/hatchet#272
This PR adds the following new functions for checkpointing GraphFrames (i.e., saving to/reading from files):
to_pickle
andfrom_pickle
(Pickle Format)to_csv
andfrom_csv
to_excel
andfrom_excel
These functions utilize similar read/write functions from Pandas. In many cases, these Pandas functions require additional dependencies. Those dependencies will not be required in Hatchet. If the dependency for a particular function is not installed, Pandas will raise an
ImportError
.This PR also adds new
save
andload
functions to theGraphFrame
class. These functions can be used to simplify the use of checkpointing. Both of these functions only require one argument: the filename. If the filename contains a recognized extension, that format will be used. Otherwise, the optionalfileformat
parameter can be provided to specify the desired format. If the necessary dependencies are not installed, theImportError
raised by Pandas will be caught. In that case, all remaining formats will be attempted. If no supported format succeeds, anIOError
will be raised.All the new functions added in this PR accepts keyword arguments (i.e.,
**kwargs
). These arguments will be passed to the Pandas function that is eventually invoked to read/write the file. Documentation (i.e., docstrings) will be added that will link to the associated functions' documentation.Other file formats (e.g., Parquet and Feather) will be added in future PRs.