Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch FFT Settings #10

Open
cancan101 opened this issue Aug 19, 2014 · 4 comments
Open

Batch FFT Settings #10

cancan101 opened this issue Aug 19, 2014 · 4 comments

Comments

@cancan101
Copy link

I have a 3D data set that is formatted such that the fastest dimension is 1024 complex numbers single precision, the second fastest dimension is 256 and the third fastest is 13. I would like to run a batch of FFTs over the the 2nd dimension (size 256).
That would be a total of 1024*13 256-point FFTs.

If this were written in MATLAB it would be: fft(array,[],2); where size(array) is 1024 256 13

My question is how do I set this FFT operation up on gpyfft? It seems the FFT class might not be rich enough for this type of batch operation: https://github.com/geggo/gpyfft/blob/master/gpyfft/fft.py#L12

@geggo
Copy link
Owner

geggo commented Aug 19, 2014

Am 19.08.2014 um 04:25 schrieb Alex Rothberg [email protected]:

I have a 3D data set that is formatted such that the fastest dimension is 1024 complex numbers single precision, the second fastest dimension is 256 and the third fastest is 13. I would like to run a batch of FFTs over the the 2nd dimension (size 256).

That would be a total of 1024*13 256-point FFTs.

If this were written in MATLAB it would be: fft(array,[],2); where size(array) is 1024 256 13

My question is how do I set this FFT operation up on gpyfft? It seems the FFT class might not be rich enough for this type of batch operation: https://github.com/geggo/gpyfft/blob/master/gpyfft/fft.py#L12


Reply to this email directly or view it on GitHub.

Hi Alex,

yes, you hit a current limitation of the high-level wrapper in gpyfft. The clAmdFft library has the constraint that for a batched transform it only allows a single value for the distance between the individual subarrays, which shall be transformed. For a 3d array, where the transform should take place only along a single axis, this constraint cannot be fulfilled for the general case with arbitrary strides. But in your case it is possible to collapse the 2 non-transformed axes into a single axis. In numpy you could do this e.g. with

data2 = np.rollaxis(data, 1) #bring second axis to front
data3 = np.reshape(data2, (256, -1)) #collaps non-transformed axis to a single axis

then perform the transform, and finally revert the shape transforms. Unfortunately, for a pyopencl array these methods are not supported, so you have to manually modify the strides attribute.

Implementing this automatic axis collapsing for cases where it is possible is an open issue on the TODO list. Since some rainy days are expected, I perhaps manage to do this the next days.

For optimum performance it could be better if you rearrange your data such that the transform takes place along the fastest varying axis, I remember having seen huge differences.

Gregor

@cancan101
Copy link
Author

Okay, cool. Thanks for looking into this.

To double check: this is a limitation with the clFFT library that can hopefully be "emulated" around in either gpyfft or with numpy to re-arrange the data in advance?

Or is this a limitation with the gpyfft implementation?

@geggo
Copy link
Owner

geggo commented Aug 19, 2014

Correct, this is a limitation of the clFFT library, which could be hidden by the gpyfft wrapper, but this needs to be implemented.
I have to correct myself, in your specific case it is not possible to express all transforms as a single batched transform. (It would be possible if your data layout would be 256x1024x13). Instead you have to perform a sequence of 13 batched transforms of 1024 256-point transforms. Possible, but needs some fiddling with sub-buffers, if you keep all your data in a single OpenCL buffer.

@SyamGadde
Copy link
Contributor

I know this is an old issue, but thought it might be an appropriate place to provide a wrapper I have written for automatically processing data in situations like this, so you don't have to "fiddle" with separate calls. It is all in one file:

https://github.com/SyamGadde/gpyfft/blob/autobatch/gpyfft/batch.py

and the usage would be something like this:

import numpy
import numpy.fft
import numpy.random
import pyopencl
import pyopencl.array

# this is the new file -- change this import if you put batch.py somewhere else
import gpyfft.batch

context = pyopencl.create_some_context()
queue = pyopencl.CommandQueue(context)

axis = 1
l = 1024; m = 256; n = 13
a = numpy.random.random_sample([l, m, n]).astype(numpy.float32)
a = a + numpy.random.random_sample([1024, 256,13]).astype(numpy.float32) * 1j
a_g = pyopencl.array.to_device(queue, a)

batchplan = gpyfft.batch.FFTBatchPlan(context, queue, a_g, [axis])
es = batchplan.fft_loop(forward=True)
pyopencl.wait_for_events(es)

b_gpyfft = a_g.get()
b_numpy = numpy.fft.fft(a, axis=axis)

assert numpy.allclose(b_gpyfft, b_numpy, atol=5e-05)

FFTBatchPlan attempts to collapse non-transformed dimensions (gpyfft.FFT() does this too, but FFTBatchPlan needs to do it itself) and arrange for batching or looping across any remaining non-transformed dimensions, even separating the transformed axes into separate transforms if needed (may not ever be necessary).

As shown above, it works with in-place transforms, should work with out-of-place if you send the out_array parameter to FFTBatchPlan. real-to-complex should work too but you need to be careful to make your "real" transform axis contiguous.

Theoretically it should work with the input array being a sliced and/or transposed version of another array (which is how you could have it still do a real transform of axis 1, but only if it is the fastest-moving axis).

Theoretically FFTBatchPlan takes care of making sure any batches it sends follow the MEM_BASE_ADDR_ALIGN (varies between 1024 and 2048 on my hardware) requirement, and will try its best to chunk the processing of a dimension so that each chunk is aligned. It will complain if it can't figure out a reasonable solution.

I have tried out all these possibilities, but haven't extensively tested them. Consider this a proof of concept. I hope this is useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants