-
Notifications
You must be signed in to change notification settings - Fork 2
Boosting the performance of pyfftw_sdp.py
#20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Review these changes at https://app.gitnotebooks.com/stumpy-dev/sliding_dot_product/pull/20 |
|
Tests are passing! Now, let's look at the performance. The data is saved in timing.csv.
We can see some improvement particularly for cases where |
|
Now, let's check the performance on MATLAB Online. I've checked the following cases:
We will use the first case, i.e. The code for Each of the images below is for one
|
|
@NimaSarajpoor Can you please provide some conclusions/observations based on the outputs above? |
|
The following observation is based on the results I obtained for I think there are still two items that we need to check:
@seanlaw |
Yes, let's see the results for
Do we know if the saved wisdom is portable/transferable across different architectures/OS (i.e., is it simply a file that gets read)? I'd assume that the object is not portable/transferable. Given that we might use a So, creating the wisdom at runtime might be slow but probably still fine if the wisdom is reused many times. Then, the user can choose to precompute the wisdom themselves BEFORE they call SDP for a range of My guess is that importing is negligible compared to the time it takes to compute the wisdom and/or to complete Does that help? |
Awesome! One thing that I worry about is the installation of |
|
That's speedup from storing the rfft object is pretty insane! |
|
I(R)FFT
codes: For now, I am not that much worried about the performance of arrays with length <= 2 ^ 5 as we should be good when length is |
For small lengths (length <= 2^16), njit might still be the fastest (at least it was according to this plot). We should check this at the end. Btw, I don't mean to go around in circles! I want to find the simplest solution |
I took another look at that plot... noticed that the spike in the beginning is for Have shared code below to not lose it for now. Need to re-design it to make it cleaner as it currently has stand-alone functions to store rfft/irfft objects. As discussed before, I can/should try to include that in the original For MATLAB's, we have this: According to the following plot, where color "red" means Python's SDP outperforms MATLAB's SDP.
If we do not store rfft/irfft objects and just create them when needed, then we will get the following plot. The color
Note: Need to wait a few days till I can work on MATLAB Online again to double check the performance of the original code shared above. In the meantime, I will work on cleaning up the code. |
|
(1) I added (2) Added According to the plot shared in my previous comment, I can put my focus on
Observations: @seanlaw Or.... after step 2, once we know that we are good for the SDP part, we can just leave this PR as is and then I can resume my work on DAMP. |
|
@NimaSarajpoor Maybe it's because I haven't looked at SDP in a long time but I'm finding it hard to follow all of the code changes/files that you are mentioning so maybe you can help guide my brain as it feels like I'm jumping all over the place, which is probably a bad sign that we are trying to do too many things simultaneously.
So, things are much faster when we store the rfft/irfft objects? Help me understand why we would ever NOT store the objects? Maybe if we are only calling SDP once? Having said that, it almost feels like the same problem as
It's not clear what this is and why do we need it? Did we discuss this in the past? Is this an edge case of some sort that differs from what we've already been doing? I just don't like seeing one-offs.
In many cases, I think that since you've done the work and are intimately aware of the details, you know what to do next. Additionally, you've done a better job in summarizing the observations. However, you are still missing the "and so what?" after stating the observations (e.g., "so based on this observation, the conclusion might be that we should go with I think what's missing is that the goals of your experiments aren't clear to the reader (they might be clear to you) and so it comes across as mixing and matching multiple experiments which is confusing (and I may be adding to the confusion with my comments).
Similar to the above, I'm missing the common thread/story that walks us through everything. I know that we've branched a lot and some things didn't work and so can you clearly describe all of the things that were tried and succeeded? First, start with describing your ONE CLEAR GOAL and then everything else in your experimentation should be focused on whether or not you've achieved that goal. Perhaps, what we need is to break things down into smaller digest-able PRs?
So, right now, I don't feel like I have a clear enough picture or understanding to answer this question. In the past, I would've also attempted to write a bunch of code to better understand the problem but I have not done that in this case so I rely on you to shave away the noise and to communicate exactly what I should (and shouldn't) focus on. This means that you'll need to take a step back and look at the broader/bigger picture and only focus on the essential experiment(s) that helped provide clarity to reach your final goal. |



















We may end up with using different algos for calculating sliding-dot-product for arrays with different lengths. However, it would be great to see if one algo, like
pyfftw_sdp, could outperform MATLAB's FFTW-based sliding dot product. It seems thatpyfftw_sdpis slightly outperformed by MATLAB's fftw_sdp when the length of array is< 2^8.After checking the performance of (R)FFT and I(R)FFT in #19 and having some discussion in pyFFTW/pyFFTW#425 (comment), I decided to see if I can boost the performance of pyfftw_sdp.