You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched the issues and found no similar issues.
Component
Transforms/universal/ededup
What happened + What you expected to happen
I used doc_id to create a doc_id_int_column and used that as the ededup_doc_id_column and ededup deleted all my samples.
Reproduction script
I changed test_ededup_python.py to use an integer column, e.g. int_column_name_cli_param: "Unnamed: 0"
which then deleted all 5 samples, while after my fix it removed just the 2 duplicates. The test still fails as the "removed" column with the 2 integer decides doesn't match the expected one.
Anything else
Can be fixed by changing line 157 in ededup_transform_base.py to ensure string ids are cached: hd[h] = str(doc_id)
(Note that the variable holding the column name has "int" in its name)
OS
Red Hat Enterprise Linux (RHEL)
Python
3.11.x
Are you willing to submit a PR?
Yes I am willing to submit a PR!
The text was updated successfully, but these errors were encountered:
Search before asking
Component
Transforms/universal/ededup
What happened + What you expected to happen
I used doc_id to create a doc_id_int_column and used that as the ededup_doc_id_column and ededup deleted all my samples.
Reproduction script
I changed test_ededup_python.py to use an integer column, e.g.
int_column_name_cli_param: "Unnamed: 0"
which then deleted all 5 samples, while after my fix it removed just the 2 duplicates. The test still fails as the "removed" column with the 2 integer decides doesn't match the expected one.
Anything else
Can be fixed by changing line 157 in ededup_transform_base.py to ensure string ids are cached:
hd[h] = str(doc_id)
(Note that the variable holding the column name has "int" in its name)
OS
Red Hat Enterprise Linux (RHEL)
Python
3.11.x
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: