Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested Dataframes causes exception #51

Open
pjmattingly opened this issue May 1, 2021 · 0 comments
Open

Nested Dataframes causes exception #51

pjmattingly opened this issue May 1, 2021 · 0 comments

Comments

@pjmattingly
Copy link

import pystore
pystore.set_path("./pystore")
store = pystore.store('test')
collection = store.collection('demo collection')

df_path_and_hash = pd.DataFrame({'path': ['path1', 'path2'], 'hash': [0, 1]})
d_container = {'idx':[1], 'dfs':[df_path_and_hash]}
df_container = pd.DataFrame(d_container)

collection.write('test item', df_container)

item = collection.item('test item')

Which causes the exception:

Traceback (most recent call last):
  File "<user>\anaconda3\envs\dev1\lib\site-packages\fastparquet\api.py", line 110, in __init__
    with open_with(fn2, 'rb') as f:
  File "<user>\anaconda3\envs\dev1\lib\site-packages\fsspec\spec.py", line 940, in open
    f = self._open(
  File "<user>\anaconda3\envs\dev1\lib\site-packages\fsspec\implementations\local.py", line 118, in _open
    return LocalFileOpener(path, mode, fs=self, **kwargs)
  File "<user>\anaconda3\envs\dev1\lib\site-packages\fsspec\implementations\local.py", line 200, in __init__
    self._open()
  File "<user>\anaconda3\envs\dev1\lib\site-packages\fsspec\implementations\local.py", line 205, in _open
    self.f = open(self.path, mode=self.mode)
FileNotFoundError: [Errno 2] No such file or directory: '<user>/Documents/Professional Work/Work/dev, keyword check; a tool to help find text in files/pystore/test/demo collection/test item/part.0.parquet/_metadata'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "KC_dev.py", line 2329, in <module>
    main()
  File "KC_dev.py", line 54, in main
    item = collection.item('test item')
  File "<user>\anaconda3\envs\dev1\lib\site-packages\pystore\collection.py", line 78, in item
    return Item(item, self.datastore, self.collection,
  File "<user>\anaconda3\envs\dev1\lib\site-packages\pystore\item.py", line 60, in __init__
    self.data = dd.read_parquet(
  File "<user>\anaconda3\envs\dev1\lib\site-packages\dask\dataframe\io\parquet\core.py", line 307, in read_parquet
    read_metadata_result = engine.read_metadata(
  File "<user>\anaconda3\envs\dev1\lib\site-packages\dask\dataframe\io\parquet\fastparquet.py", line 678, in read_metadata
    parts, pf, gather_statistics, base_path = _determine_pf_parts(
  File "<user>\pmatt\anaconda3\envs\dev1\lib\site-packages\dask\dataframe\io\parquet\fastparquet.py", line 159, in _determine_pf_parts
    pf = ParquetFile(paths, open_with=fs.open, **kwargs.get("file", {}))
  File "<user>\anaconda3\envs\dev1\lib\site-packages\fastparquet\api.py", line 90, in __init__
    basepath, fmd = metadata_from_many(fn, verify_schema=verify,
  File "<user>\anaconda3\envs\dev1\lib\site-packages\fastparquet\util.py", line 134, in metadata_from_many
    pfs = [api.ParquetFile(fn, open_with=open_with) for fn in file_list]
  File "<user>\anaconda3\envs\dev1\lib\site-packages\fastparquet\util.py", line 134, in <listcomp>
    pfs = [api.ParquetFile(fn, open_with=open_with) for fn in file_list]
  File "<user>\anaconda3\envs\dev1\lib\site-packages\fastparquet\api.py", line 116, in __init__
    self._parse_header(f, verify)
  File "<user>\anaconda3\envs\dev1\lib\site-packages\fastparquet\api.py", line 133, in _parse_header
    f.seek(-(head_size+8), 2)
OSError: [Errno 22] Invalid argument

An un-nested dataframe causes no issue:

import pystore
pystore.set_path("./pystore")
store = pystore.store('test')
collection = store.collection('demo collection')

df_path_and_hash = pd.DataFrame({'path': ['path1', 'path2'], 'hash': [0, 1]})
d_container = {'idx':[1], 'dfs':[df_path_and_hash]}
df_container = pd.DataFrame(d_container)

collection.write('test item', df_path_and_hash)

item = collection.item('test item')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant