-
-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve image combination performance #741
Conversation
Codecov Report
@@ Coverage Diff @@
## main #741 +/- ##
==========================================
- Coverage 96.01% 94.93% -1.09%
==========================================
Files 30 30
Lines 3942 4049 +107
==========================================
+ Hits 3785 3844 +59
- Misses 157 205 +48
Continue to review full report at Codecov.
|
Performance improvements (no clipping done)Some benchmark plots are below. left-most commit is master, right-most uses bottleneck. Second-to-last uses
|
Ping @crawfordsm @MSeifert04 @saimn @cmccully @ysBach -- you are either a ccdproc maintainer or someone who is working hard separate from this to improve image combination speed. Keep an eye out for an email invite later today to talk about whether we can pool efforts on this or not. I don't need a detailed review of this -- but if you have objects to this stop-gap approach for improving performance please speak up! |
Great to see progress on this. I started to do some comparison with what we have in DRAGONS, but it is still a wip. About bottleneck, since I had the opportunity to look at it more closely recently (astropy/astropy#10553 (comment)):
Other than that, 👍 to use |
Looking more in detail at the code it seems that ccdproc converts the data to np.float64 by default, unless a dtype is specified. So the good news is that it will not be affected by the bottleneck issue. But in terms of performance it would be faster to use the original data's dtype. |
One other bottleneck note: there are some precision issues with sums of float32: pydata/bottleneck#193 |
08ef65b
to
698edd1
Compare
388cbbd
to
7f0c9cc
Compare
f329196
to
f942355
Compare
3560821
to
dcafaab
Compare
a8f395f
to
d581703
Compare
@saimn @ysBach -- any chance either of you can take a look at this? It has become a little convoluted because I'm trying to make sure I don't change the API. THis si only a small step towards improving performance, but it is a step... For median the only improve is if bottleneck is installed because Once this is wrapped up (hopefully this week) I'd like to come back to further improvements, likely the second week of June. |
3163156
to
ee57383
Compare
Yes, I changed one test to get this to pass. Note, though, that the test was of the return value of a completely masked result.
Also fix a couple of small sphinx-related issues
Includes using bottleneck for performance when it is available. Implement weighted sum and test weighted sum This includes factoring out the guts of the weighted sum for use in a couple of combination methods.
Putting it all in one place is the current practice.
The worry is that users may be passing in functions that expect masked data and we don't want to break that. It would be an API change that requires a new major release.
ee57383
to
29df104
Compare
Coming a bit late but I don't see any issue (and my knowledge of ccdproc's combine code is limited). 🎉 |
This pull request attempts to improve the performance of ccdproc by using using numpy's
nan*
functions instead of numpyMaskedArray
, and using by usingbottleneck
. It is not intended (yet) to change the API forCombiner
.So far I have only switched
average_combine
to do this. If I get a couple 👍 on this I'll do the same formedian_combine
and the clipping routines, and also combine the implementations ofsum_combine
andaverage_combine
to the extent possible.To do:
nan*
orbottleneck
for clippingConsider making dtypes settable (might belong in-- this is a separate issue, really, and CCDData lives in astropy, core, not here.CCDData
).Edit: Fixes #719