Multiarray searchsorted fails #14833

Anaphory · 2016-12-08T18:04:17Z

Code Sample, a copy-pastable example if possible

pandas.MultiIndex([[0],["a"]], [[0],[0]]).searchsorted((1,"b"))

Problem description

The entry (1,"b") should come after the existing (0,"a") in the MultiIndex. (Alternatively, MultiIndex could throw a clean error message.) Instead, an intransparent exception is raised:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/site-packages/pandas/core/base.py", line 1156, in searchsorted
    return self.values.searchsorted(key, side=side, sorter=sorter)
TypeError: unorderable types: tuple() > str()

This is because Index.searchsorted naïvely passes its arguments to numpy.searchsorted, which is unaware that its second argument is a sequence of tuples, not a plain array just of dimension one higher.

Expected Output

1

Output of `pd.show_versions()`

# Paste the output here pd.show_versions() here NSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.8.10-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 28.8.0
Cython: 0.25.1
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.3
openpyxl: None
xlrd: 1.0.0
xlwt: 0.8.0
xlsxwriter: None
lxml: None
bs4: 4.5.1
html5lib: 0.9999999
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2016-12-08T18:51:02Z

I suppose could just disable this. numpy doesn't undertsand object array searchsorted generally

In [4]: pandas.MultiIndex([[0],["a"]], [[0],[0]]).values.searchsorted((1,"b"))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-b9352c7b8bea> in <module>()
----> 1 pandas.MultiIndex([[0],["a"]], [[0],[0]]).values.searchsorted((1,"b"))

TypeError: unorderable types: tuple() > str()

maybe just raise a NotImplementedError. This is pretty much a useless operation anyhow, you always to search by levels via indexing

ritwickdsouza · 2017-02-01T17:10:02Z

Is this issue still available to fix ?

jreback · 2017-02-01T17:15:54Z

yes,

note that we should simply define .searchsorted in pandas/indexes/multi.py and use the direct indexers, .get_indexer, which is way more efficient (as its hashtable based).

ritwickdsouza · 2017-02-03T11:37:31Z

@jreback I am new to pandas, could you throw some light on how i can use .get_indexer to implement .searchsorted ?

jreback · 2017-02-03T13:38:56Z

.get_indexer returns the indexer , IOW the location of the point. -1 marks not found items. This works on any multi-index. Note these don't even have to be sorted (but is more efficient if they are).

In [2]: i = pd.MultiIndex.from_tuples([(0, 'a'), (0, 'b'), (1, 'a')])

In [3]: i
Out[3]: 
MultiIndex(levels=[[0, 1], ['a', 'b']],
           labels=[[0, 0, 1], [0, 1, 0]])

In [4]: i.values
Out[4]: array([(0, 'a'), (0, 'b'), (1, 'a')], dtype=object)

In [5]: i.get_indexer([(0,'b'), (1, 'a'), (2, 'c')])
Out[5]: array([ 1,  2, -1])

Here's what searchsorted does; I am using an interger array because numpy doesn't play nice with tuples. It returns the indexer of the match (IOW where it is in the array). Note if something is not found it returns the last index before that (which is really unintuitve!)

In [6]: np.array([1, 2, 3]).searchsorted([2, 3])
Out[6]: array([1, 2])

In [7]: np.array([1, 2, 3]).searchsorted([2, 3, 5])
Out[7]: array([1, 2, 3])

bhavybarca · 2018-03-02T03:39:48Z

@jreback this issue still open ?

jreback · 2018-03-02T03:53:36Z

yes

bhavybarca · 2018-03-03T17:14:58Z

@TomAugspurger @jreback i am a little bit confused to what should i do exactly, i mean should i simply raise a NotImplementedError as said by @jreback ? or should i replace the searchsorted by get_indexer and somehow get a value for even tuples as said is issue example

TomAugspurger · 2018-03-16T12:05:56Z

If supporting searchsorted is an option that makes sense. #14833 (comment) indicates that that can be done using get_indexer.

btel · 2018-10-16T23:05:08Z

Since there was no activity since March, I would be interested in working on this issue. It might be nice to implement searchsorted as suggested by @jreback, but the issue I see is that numpy's searchsorted can give the location of an element in a sorted array that would keep the sort order even if the element does not exist. Here is an example:

>>> np.searchsorted([1, 3, 5], [2])
array([1])

From what I understand .get_indexer will simply return -1 (element not found). Naive implementation of numpy's behaviour might use bisect module from Python's standard library for the not-found elements, but it would be rather inefficient.

Interestingly, numpy's searchsorted can also work with tuples if we define an appropriate dtype:

>>> dtype = [("int", 'i8'), ("str", "U1" )]
>>> arr = np.array([(0, 'a'), (0, 'b'), (1, 'c')], dtype=dtype)
>>> arr.searchsorted(np.array([(1, 'a')], dtype=dtype))
array([2])

A possible implementation of searchsorted would then coerce the multiindex to an ndarray with adapted dtype, and use the numpy's builtin searchsorted. What do you think?

jreback · 2018-10-16T23:19:45Z

so this might be easier now as the implementation of MI was recently refactored to directly keep the underlying codes in the cython table

SaturnFromTitan · 2020-03-04T13:35:21Z

take

Condielj · 2022-03-28T18:19:22Z

take

GSAUC3 · 2025-05-10T07:35:58Z

Hi, is anyone still working on this, or may I take it up?
if the answer is NO, i.e. no one is working on this, then i have a couple of question:

btel suggested in #14833 (comment)

This can be one way to handle it. but it assumes the input array to be of 2-dimensional.
Should the input array be restricted to 2 dimensional ?
May i go ahead with this implementation, or should i just simple raise NotImplementedError ?

GSAUC3 · 2025-05-11T14:25:37Z

take

jreback added Difficulty Novice Error Reporting Incorrect or improved errors from pandas MultiIndex labels Dec 8, 2016

jreback added this to the Next Major Release milestone Dec 8, 2016

ShaharBental mentioned this issue Dec 26, 2016

ERR: MultiIndex searchsorted #14990

Closed

4 tasks

TomAugspurger added the good first issue label Oct 11, 2017

jreback removed the Difficulty Novice label Dec 15, 2017

btel added a commit to btel/pandas that referenced this issue Oct 17, 2018

implement searchsorted for multindex (bug pandas-dev#14833)

4251f99

btel mentioned this issue Oct 17, 2018

implement searchsorted for multindex (bug #14833) #23210

Closed

4 tasks

jbrockmendel removed the Effort Low label Oct 21, 2019

SaturnFromTitan mentioned this issue Mar 2, 2020

TST: Using more fixtures in of tests/base/test_ops.py #32313

Merged

3 tasks

github-actions bot assigned SaturnFromTitan Mar 4, 2020

SaturnFromTitan removed their assignment Mar 4, 2020

mroeschke added the Enhancement label May 2, 2021

github-actions bot assigned Condielj Mar 28, 2022

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

github-actions bot assigned GSAUC3 May 11, 2025

GSAUC3 linked a pull request May 12, 2025 that will close this issue

ENH: Implemented MultiIndex.searchsorted method ( GH14833) #61435

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiarray searchsorted fails #14833

Multiarray searchsorted fails #14833

Anaphory commented Dec 8, 2016

jreback commented Dec 8, 2016

ritwickdsouza commented Feb 1, 2017

jreback commented Feb 1, 2017

ritwickdsouza commented Feb 3, 2017

jreback commented Feb 3, 2017

bhavybarca commented Mar 2, 2018

jreback commented Mar 2, 2018

bhavybarca commented Mar 3, 2018

TomAugspurger commented Mar 16, 2018

btel commented Oct 16, 2018

jreback commented Oct 16, 2018

SaturnFromTitan commented Mar 4, 2020

Condielj commented Mar 28, 2022

GSAUC3 commented May 10, 2025

GSAUC3 commented May 11, 2025

Multiarray searchsorted fails #14833

Multiarray searchsorted fails #14833

Comments

Anaphory commented Dec 8, 2016

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

jreback commented Dec 8, 2016

ritwickdsouza commented Feb 1, 2017

jreback commented Feb 1, 2017

ritwickdsouza commented Feb 3, 2017

jreback commented Feb 3, 2017

bhavybarca commented Mar 2, 2018

jreback commented Mar 2, 2018

bhavybarca commented Mar 3, 2018

TomAugspurger commented Mar 16, 2018

btel commented Oct 16, 2018

jreback commented Oct 16, 2018

SaturnFromTitan commented Mar 4, 2020

Condielj commented Mar 28, 2022

GSAUC3 commented May 10, 2025

GSAUC3 commented May 11, 2025

Output of `pd.show_versions()`