-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Multiarray searchsorted fails #14833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I suppose could just disable this. numpy doesn't undertsand object array searchsorted generally
maybe just raise a |
Is this issue still available to fix ? |
yes, note that we should simply define |
@jreback I am new to pandas, could you throw some light on how i can use |
Here's what searchsorted does; I am using an interger array because numpy doesn't play nice with tuples. It returns the indexer of the match (IOW where it is in the array). Note if something is not found it returns the last index before that (which is really unintuitve!)
|
@jreback this issue still open ? |
yes |
@TomAugspurger @jreback i am a little bit confused to what should i do exactly, i mean should i simply raise a NotImplementedError as said by @jreback ? or should i replace the searchsorted by get_indexer and somehow get a value for even tuples as said is issue example |
If supporting |
Since there was no activity since March, I would be interested in working on this issue. It might be nice to implement searchsorted as suggested by @jreback, but the issue I see is that numpy's searchsorted can give the location of an element in a sorted array that would keep the sort order even if the element does not exist. Here is an example:
From what I understand Interestingly, numpy's searchsorted can also work with tuples if we define an appropriate dtype:
A possible implementation of searchsorted would then coerce the multiindex to an ndarray with adapted dtype, and use the numpy's builtin searchsorted. What do you think? |
so this might be easier now as the implementation of MI was recently refactored to directly keep the underlying codes in the cython table |
take |
take |
Hi, is anyone still working on this, or may I take it up? btel suggested in #14833 (comment) This can be one way to handle it. but it assumes the input array to be of 2-dimensional. |
take |
Code Sample, a copy-pastable example if possible
Problem description
The entry
(1,"b")
should come after the existing(0,"a")
in theMultiIndex
. (Alternatively, MultiIndex could throw a clean error message.) Instead, an intransparent exception is raised:This is because
Index.searchsorted
naïvely passes its arguments tonumpy.searchsorted
, which is unaware that its second argument is a sequence of tuples, not a plain array just of dimension one higher.Expected Output
1
Output of
pd.show_versions()
pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 28.8.0
Cython: 0.25.1
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.3
openpyxl: None
xlrd: 1.0.0
xlwt: 0.8.0
xlsxwriter: None
lxml: None
bs4: 4.5.1
html5lib: 0.9999999
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: