-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supports list-like Python objects for Series comparison. #2022
base: master
Are you sure you want to change the base?
Conversation
Found a bug: >>> pser = pd.Series([1,2,3], index=[10,20,30])
>>> pser == [3, 2, 1]
10 False
20 True
30 False
dtype: bool whereas: >>> kser = ks.Series([1,2,3], index=[10,20,30])
>>> kser == [3, 2, 1]
0 False
1 False
10 False
2 False
30 False
20 False
dtype: bool |
Codecov Report
@@ Coverage Diff @@
## master #2022 +/- ##
==========================================
- Coverage 94.70% 93.18% -1.53%
==========================================
Files 54 54
Lines 11480 11393 -87
==========================================
- Hits 10872 10616 -256
- Misses 608 777 +169
Continue to review full report at Codecov.
|
Btw, we might also want to support binary operations with list-like Python objects? cc @HyukjinKwon >>> pser + [3, 2, 1]
10 4
20 4
30 4
dtype: int64
>>> pser - [3, 2, 1]
10 -2
20 0
30 2
dtype: int64
>>> [3, 2, 1] + pser
10 4
20 4
30 4
dtype: int64 |
FYI: Seems like pandas has some inconsistent behavior as below. >>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> a.eq(b)
a True
b False
c False
d False
e False
dtype: bool
>>> a == b
Traceback (most recent call last):
...
ValueError: Can only compare identically-labeled Series objects However, in their API doc for I posted question to pandas repo, and will share if they response. |
Let me do this in the separated PR since there may be inconsistent cases like |
The |
databricks/koalas/series.py
Outdated
def __eq__(self, other): | ||
if isinstance(other, (list, tuple)): | ||
other = ks.Index(other, name=self.name) | ||
# pandas always returns False for all items with dict and set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why pandas behaves like this ..
databricks/koalas/series.py
Outdated
|
||
equals = eq | ||
|
||
def __eq__(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Index
support this case too? it might be best to move to base.py
.
…parison ### What changes were proposed in this pull request? This PR proposes to implement `Series` comparison with list-like Python objects. Currently `Series` doesn't support the comparison to list-like Python objects such as `list`, `tuple`, `dict`, `set`. **Before** ```python >>> psser 0 1 1 2 2 3 dtype: int64 >>> psser == [3, 2, 1] Traceback (most recent call last): ... TypeError: The operation can not be applied to list. ... ``` **After** ```python >>> psser 0 1 1 2 2 3 dtype: int64 >>> psser == [3, 2, 1] 0 False 1 True 2 False dtype: bool ``` This was originally proposed in databricks/koalas#2022, and all reviews in origin PR has been resolved. ### Why are the changes needed? To follow pandas' behavior. ### Does this PR introduce _any_ user-facing change? Yes, the `Series` comparison with list-like Python objects now possible. ### How was this patch tested? Unittests Closes #34114 from itholic/SPARK-36438. Authored-by: itholic <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
Currently Series doesn't support the comparison to list-like Python objects such as
list
,tuple
,dict
,set
.This PR proposes supporting them as well for Series comparison.
This should resolve #2018