You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a lot of csv need import to store .
But the dataset doesn't increase .
On my testing, append always overwrite data with a large index ,
For example,
df1 have index np.arange(10) and other 4 columns
df2 have index np.arange(12) and other 4 columns
df3 have index np.arange(11) and other 4 columns
There is no duplicates in df1,df2 and df3 except some index .
item.write(df1)
item.append(df2)
item.append(df3)
Finally, item size is same as df2.
After some digging , I found pystore with data = data[~data.index.isin(old_index)] , only insert new index !
I think this is a bad assumption, user wouldn't know unless he review the code.
def append(...)
...
try:
if epochdate or ("datetime" in str(data.index.dtype) and
any(data.index.nanosecond) > 0):
data = utils.datetime_to_int64(data)
old_index = dd.read_parquet(self._item_path(item, as_string=True),
columns=[], engine=self.engine
).index.compute()
data = data[~data.index.isin(old_index)]
except Exception:
return
Append should never remove any row by default , only if user require, that is plain meaning of append.
The text was updated successfully, but these errors were encountered:
eromoe
changed the title
Append lose data by default concern on index value is a problem.
Append lose data : by default remove duplicted indices.
Dec 10, 2022
I have a lot of csv need import to store .
But the dataset doesn't increase .
On my testing, append always overwrite data with a large index ,
For example,
np.arange(10)
and other 4 columnsnp.arange(12)
and other 4 columnsnp.arange(11)
and other 4 columnsThere is no duplicates in df1,df2 and df3 except some index .
Finally, item size is same as df2.
After some digging , I found
pystore
withdata = data[~data.index.isin(old_index)]
, only insert new index !I think this is a bad assumption, user wouldn't know unless he review the code.
Append
should never remove any row by default , only if user require, that is plain meaning of append.The text was updated successfully, but these errors were encountered: