Skip to content

implement searchsorted for multindex (bug #14833) #23210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

btel
Copy link

@btel btel commented Oct 17, 2018

Proof-of-concept to fix #14833.

This works:

>>> import pandas
>>> pandas.MultiIndex.from_tuples([('a', 0), ('b', 1)]).searchsorted(('b', 0))
1

@pep8speaks
Copy link

pep8speaks commented Oct 17, 2018

Hello @btel! Thanks for updating the PR.

Comment last updated on October 17, 2018 at 19:42 Hours UTC

@codecov
Copy link

codecov bot commented Oct 17, 2018

Codecov Report

Merging #23210 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23210      +/-   ##
==========================================
+ Coverage   92.19%   92.19%   +<.01%     
==========================================
  Files         169      169              
  Lines       50954    50961       +7     
==========================================
+ Hits        46975    46982       +7     
  Misses       3979     3979
Flag Coverage Δ
#multiple 90.61% <100%> (ø) ⬆️
#single 42.27% <28.57%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 95.48% <100%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9285820...e946e08. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced of the utility of using searchsorted here, what are you going to do with this?

@@ -2899,6 +2900,13 @@ def isin(self, values, level=None):
else:
return np.lib.arraysetops.in1d(labs, sought_labels)

def searchsorted(self, arr):
dtype = [l.dtype.descr for l in self.levels]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be much more performant to do this level by level (as you iterate over the key). which btw must be fully specified. needs a doc-string and error checking.

@@ -346,3 +346,8 @@ def test_get_indexer_categorical_time():
Categorical(date_range("2012-01-01", periods=3, freq='H'))])
result = midx.get_indexer(midx)
tm.assert_numpy_array_equal(result, np.arange(9, dtype=np.intp))


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs error checking, and a full-on test where the element is there (to be honest you should simply do a .get_loc()) first to find it then if not found do the searchsorted. You must also assert that the index is lexsorted as well.

@jreback jreback added MultiIndex Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 26, 2018
@jreback
Copy link
Contributor

jreback commented Oct 26, 2018

cc @toobaz

@WillAyd
Copy link
Member

WillAyd commented Nov 24, 2018

Can you merge master and address comments?

@jreback
Copy link
Contributor

jreback commented Dec 14, 2018

closing as stale, but ping to reopen if can continue

@jreback jreback closed this Dec 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiarray searchsorted fails
4 participants