-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistencies in nansum for float32 dtype compared to numpy #193
Comments
I think numpy uses a more robust algorithm: https://en.wikipedia.org/wiki/Pairwise_summation. Bottleneck doesn't. Is bottleneck used at JPL?! |
@kwgoodman Thanks for the quick response! To be precise, I have been using it indirectly through xarray for multiple JPL projects. I take it that this is a well known issue then and this isn't going to be addressed anytime soon? I ended up discovering this when I wanted to calculate some statistics for a dataset that's similar in size to the one in the example I posted, and got some pretty serious errors. In this case, the dataset had a min and max that were very close to each other, but the mean ended up being lower than the min and consequently the standard deviation was over an order of magnitude larger than its actual value. |
Happy to hear that bottleneck is used, even if indirectly, at JPL. I'd be interested in trying pairwise summation in bn.nansum and bn.nanmean. I get paid for releases of numerox but have not found funding for bottleneck development. |
Hi, |
I don't think it did and I have no bandwith to tackle this. #424 |
This might also be related #414 |
Consider this simple example:
Looks like errors in the computation are compounding due to loss of precision, as the problem becomes much less apparent for smaller datasets. Repeating the above for the
float64
dtype gives me much more consistent results.I tested this example for bottleneck 1.1.0 and 1.2.1
The text was updated successfully, but these errors were encountered: