Supports list-like Python objects for Series comparison. #2022

itholic · 2021-01-27T03:35:54Z

Currently Series doesn't support the comparison to list-like Python objects such as list, tuple, dict, set.

>>> kser
0    1
1    2
2    3
dtype: int64

>>> kser == [3, 2, 1]
Traceback (most recent call last):
...
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o77.equalTo.
...

This PR proposes supporting them as well for Series comparison.

>>> kser
0    1
1    2
2    3
dtype: int64

>>> kser == [3, 2, 1]
0    False
1     True
2    False
dtype: bool

This should resolve #2018

ueshin · 2021-01-27T03:43:36Z

Found a bug:

>>> pser = pd.Series([1,2,3], index=[10,20,30])
>>> pser == [3, 2, 1]
10    False
20     True
30    False
dtype: bool

whereas:

>>> kser = ks.Series([1,2,3], index=[10,20,30])
>>> kser == [3, 2, 1]
0     False
1     False
10    False
2     False
30    False
20    False
dtype: bool

codecov-io · 2021-01-27T03:59:32Z

Codecov Report

Merging #2022 (44e34f6) into master (901cea6) will decrease coverage by 1.52%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #2022      +/-   ##
==========================================
- Coverage   94.70%   93.18%   -1.53%     
==========================================
  Files          54       54              
  Lines       11480    11393      -87     
==========================================
- Hits        10872    10616     -256     
- Misses        608      777     +169

Impacted Files	Coverage Δ
databricks/koalas/indexes/base.py	`97.62% <100.00%> (+0.19%)`	⬆️
databricks/koalas/series.py	`96.72% <100.00%> (-0.07%)`	⬇️
databricks/koalas/usage_logging/__init__.py	`27.58% <0.00%> (-64.92%)`	⬇️
databricks/koalas/usage_logging/usage_logger.py	`47.82% <0.00%> (-52.18%)`	⬇️
databricks/koalas/__init__.py	`82.66% <0.00%> (-8.38%)`	⬇️
databricks/koalas/accessors.py	`86.43% <0.00%> (-7.04%)`	⬇️
databricks/conftest.py	`93.22% <0.00%> (-6.78%)`	⬇️
databricks/koalas/namespace.py	`79.91% <0.00%> (-4.50%)`	⬇️
databricks/koalas/generic.py	`90.57% <0.00%> (-2.68%)`	⬇️
databricks/koalas/typedef/typehints.py	`92.03% <0.00%> (-1.77%)`	⬇️
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 901cea6...44e34f6. Read the comment docs.

ueshin · 2021-01-27T04:10:11Z

Btw, we might also want to support binary operations with list-like Python objects? cc @HyukjinKwon

>>> pser + [3, 2, 1]
10    4
20    4
30    4
dtype: int64
>>> pser - [3, 2, 1]
10   -2
20    0
30    2
dtype: int64
>>> [3, 2, 1] + pser
10    4
20    4
30    4
dtype: int64

…ries_eq_list

itholic · 2021-02-04T01:45:28Z

FYI: Seems like pandas has some inconsistent behavior as below.

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])

>>> a.eq(b)
a     True
b    False
c    False
d    False
e    False
dtype: bool

>>> a == b
Traceback (most recent call last):
...
ValueError: Can only compare identically-labeled Series objects

However, in their API doc for Series.eq, it says "Equivalent to series == other".

I posted question to pandas repo, and will share if they response.

itholic · 2021-02-04T02:18:08Z

Btw, we might also want to support binary operations with list-like Python objects? cc @HyukjinKwon

>>> pser + [3, 2, 1]
10    4
20    4
30    4
dtype: int64
>>> pser - [3, 2, 1]
10   -2
20    0
30    2
dtype: int64
>>> [3, 2, 1] + pser
10    4
20    4
30    4
dtype: int64

Let me do this in the separated PR since there may be inconsistent cases like eq.

ueshin · 2021-02-04T06:42:05Z

The eq case sounds different from the topic here which is binary operations between Series and list.

databricks/koalas/indexes/base.py

databricks/koalas/series.py

HyukjinKwon · 2021-02-17T04:51:09Z

databricks/koalas/series.py

+    def __eq__(self, other):
+        if isinstance(other, (list, tuple)):
+            other = ks.Index(other, name=self.name)
+        # pandas always returns False for all items with dict and set.


I wonder why pandas behaves like this ..

HyukjinKwon · 2021-02-17T04:51:23Z

databricks/koalas/series.py


    equals = eq

+    def __eq__(self, other):


Does Index support this case too? it might be best to move to base.py.

…ries_eq_list

databricks/koalas/series.py

…ries_eq_list

xinrong-meng · 2021-08-05T21:55:38Z

https://issues.apache.org/jira/browse/SPARK-36438

…parison ### What changes were proposed in this pull request? This PR proposes to implement `Series` comparison with list-like Python objects. Currently `Series` doesn't support the comparison to list-like Python objects such as `list`, `tuple`, `dict`, `set`. **Before** ```python >>> psser 0 1 1 2 2 3 dtype: int64 >>> psser == [3, 2, 1] Traceback (most recent call last): ... TypeError: The operation can not be applied to list. ... ``` **After** ```python >>> psser 0 1 1 2 2 3 dtype: int64 >>> psser == [3, 2, 1] 0 False 1 True 2 False dtype: bool ``` This was originally proposed in databricks/koalas#2022, and all reviews in origin PR has been resolved. ### Why are the changes needed? To follow pandas' behavior. ### Does this PR introduce _any_ user-facing change? Yes, the `Series` comparison with list-like Python objects now possible. ### How was this patch tested? Unittests Closes #34114 from itholic/SPARK-36438. Authored-by: itholic <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

itholic added 4 commits January 27, 2021 11:50

Series.eq supports list-like python objects

708968a

Override the __eq__

d2b23b3

Addressed name of Series & added more tests

06438c8

Addressed the comments

9a5d249

itholic added 2 commits February 1, 2021 15:52

Resolve conflicts & trigger the test

8f2280c

Merge branch 'master' of https://github.com/databricks/koalas into se…

3720756

…ries_eq_list

itholic added 2 commits February 8, 2021 14:21

Fix bug

e484e6d

Add tuple type for Index initializer

44e34f6

itholic commented Feb 8, 2021

View reviewed changes

databricks/koalas/indexes/base.py Outdated Show resolved Hide resolved

itholic commented Feb 8, 2021

View reviewed changes

databricks/koalas/series.py Outdated Show resolved Hide resolved

This was referenced Feb 9, 2021

Fix binary operations Index by Series. #2046

Merged

Enabling binary operations with list-like Python objects. #2054

Open

itholic requested review from xinrong-meng, ueshin and HyukjinKwon February 17, 2021 03:06

HyukjinKwon reviewed Feb 17, 2021

View reviewed changes

itholic added 3 commits February 18, 2021 14:37

Use internal pandas

54dcc6b

Fix tests

4915afd

Merge branch 'master' of https://github.com/databricks/koalas into se…

42dae8f

…ries_eq_list

xinrong-meng reviewed Feb 18, 2021

View reviewed changes

databricks/koalas/series.py Show resolved Hide resolved

itholic added 2 commits March 5, 2021 10:48

Merge branch 'master' of https://github.com/databricks/koalas into se…

3fd7511

…ries_eq_list

Add docstirng list-like

31e7647

itholic mentioned this pull request Sep 27, 2021

[SPARK-36438][PYTHON] Support list-like Python objects for Series comparison apache/spark#34114

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supports list-like Python objects for Series comparison. #2022

Supports list-like Python objects for Series comparison. #2022

itholic commented Jan 27, 2021

ueshin commented Jan 27, 2021

codecov-io commented Jan 27, 2021 •

edited

Loading

ueshin commented Jan 27, 2021 •

edited

Loading

itholic commented Feb 4, 2021 •

edited

Loading

itholic commented Feb 4, 2021

ueshin commented Feb 4, 2021

HyukjinKwon Feb 17, 2021

HyukjinKwon Feb 17, 2021

xinrong-meng commented Aug 5, 2021

Supports list-like Python objects for Series comparison. #2022

Are you sure you want to change the base?

Supports list-like Python objects for Series comparison. #2022

Conversation

itholic commented Jan 27, 2021

ueshin commented Jan 27, 2021

codecov-io commented Jan 27, 2021 • edited Loading

Codecov Report

ueshin commented Jan 27, 2021 • edited Loading

itholic commented Feb 4, 2021 • edited Loading

itholic commented Feb 4, 2021

ueshin commented Feb 4, 2021

HyukjinKwon Feb 17, 2021

Choose a reason for hiding this comment

HyukjinKwon Feb 17, 2021

Choose a reason for hiding this comment

xinrong-meng commented Aug 5, 2021

codecov-io commented Jan 27, 2021 •

edited

Loading

ueshin commented Jan 27, 2021 •

edited

Loading

itholic commented Feb 4, 2021 •

edited

Loading