You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A pandas dataframe with date as index (type DateTimeIndex) is stored with Pystore. Last entered date is eg 2020-01-01.
I call append to add a row with index 2020-01-02, with all data in all dataframe columns being identical (np.nan) to the row with index 2020-01-01 then only the last row (2020-01-02) is stored.
I suspect the line "combined = dd.concat([current.data, new]).drop_duplicates(keep="last")" in collection.py is the reason.
IRL perhaps unlikely that two days have 100% identical data (EOD stock data) but shouldn't the time series index be honored in this case?
The text was updated successfully, but these errors were encountered:
I can't reproduce your issue, but I get something quite similar. basically after an append the index get out of order. data is OKish, but not the index.
is a small file this is not an issue as results can be sorted, but on large files it's very slow
pystore manages overlaping data, so i'm quite sure that's not the issue. The issue seems to be coming from dask, however there's no "sort_index" in dask only sort values and set index, which is not what we need here.
i wonder if this is caused by dask droping support for fastparquet
Given that pystore is still maintained...
A pandas dataframe with date as index (type DateTimeIndex) is stored with Pystore. Last entered date is eg 2020-01-01.
I call append to add a row with index 2020-01-02, with all data in all dataframe columns being identical (np.nan) to the row with index 2020-01-01 then only the last row (2020-01-02) is stored.
I suspect the line "combined = dd.concat([current.data, new]).drop_duplicates(keep="last")" in collection.py is the reason.
IRL perhaps unlikely that two days have 100% identical data (EOD stock data) but shouldn't the time series index be honored in this case?
The text was updated successfully, but these errors were encountered: