Give user the ability to specify buffer offsets in enqueue_transform #29

SyamGadde · 2017-01-24T20:07:16Z

Sorry for the raft of pull requests -- these are all modifications I've been using successfully for a while and have found very useful, and would welcome your input.

This set of commits adds in_offsets and out_offsets to enqueue_transform(), which, if specified, triggers the creation of sub-buffers under the hood. For those arrays that include multiple non-transformed axes that can't be "collapsed" into a single axis, this change allows the user to slice the input array into appropriate chunks (using the standard slice operator) and simply send x.base_data and x.offset as the inputs/outputs -- though only as long as the offset is a multiple of CL_DEVICE_MEM_BASE_ADDR_ALIGN. It also allows for transforming arrays that have non-zero offsets. For example, to partially address issue #10 without needing to create copies of the original array or create our own OpenCL sub-buffers:

import gpyfft
import pyopencl
import pyopencl.array
context = pyopencl.create_some_context()
queue = pyopencl.CommandQueue(context)
import numpy; a = numpy.zeros([1024, 256, 13], order='F', dtype=numpy.complex64)
a[:,:,:] = numpy.arange(256)[numpy.newaxis, :, numpy.newaxis]
a_gpu = pyopencl.array.to_device(queue, a); b_gpu = pyopencl.array.zeros_like(a_gpu)

# notice here we use a sliced array which has a non-zero offset
transform = gpyfft.fft.FFT(context, queue, a_gpu[:,:,1], axes = (1,), out_array=b_gpu[:,:,1])

# here we transform using the high-level API (the non-zero offset would have failed here before these commits)
(event,) = transform.enqueue()
event.wait()
result1 = b_gpu.get()[:,:,1]

# here we transform all 13 chunks using the low-level API (so we don't have to re-create the plan)
# (all but the first chunk have non-zero offsets)
events = [
    transform.plan.enqueue_transform((queue,), (a_gpu.base_data,), (b_gpu.base_data,), in_offsets=(a_gpu[:,:,i].offset,), out_offsets=(b_gpu[:,:,i].offset,))[0]
    for i in range(13)
]
for event in events:
    event.wait()
result2 = b_gpu.get()

… we create sub-buffers, and release them when done).

…'t make sense for sub-buffer.

geggo · 2017-01-25T09:32:49Z

Hi,
thanks for providing a way to allow more general memory layouts and resolve issue #10. I need more time to look into this.

After having a quick glance at the code I have some comments/questions:

I would like to keep the low-level wrapper a simple, thin layer around the clFFT library, without adding too much functionality not present in the library. Just for the sake of maintainability. Therefor I would prefer if all the sub buffer handling takes place in the high-level wrapper, ideally in pure python, relying on pyopencl. Do you thinks that is possible and makes sense?

SyamGadde · 2017-01-25T15:14:40Z

Sure, I don't see why it can't be done all in the high-level API -- in that case the same thing could be done in enqueue() instead. I'll amend this in a bit...

SyamGadde · 2017-01-25T15:21:02Z

OK, one reason I changed the low-level enqueue_transform was because that was the only way I could re-use the baked plan for multiple arrays (or multiple slices of the same array). Would it be acceptable to extend the high-level enqueue() to take optional data/result keyword arguments? If specified, they would override the stored data and result attributes and allow users to call enqueue repeatedly. It could double check that the shape/strides/dtype match the original data used to create the plan. If those arguments are not specified, it would use the stored values as normal.

geggo · 2017-01-25T16:56:34Z

Am 25.01.2017 um 16:21 schrieb Syam Gadde ***@***.***>: OK, one reason I changed the low-level enqueue_transform was because that was the only way I could re-use the baked plan for multiple arrays (or multiple slices of the same array). Would it be acceptable to extend the high-level enqueue() to take optional data/result keyword arguments? If specified, they would override the stored data and result attributes and allow users to call enqueue repeatedly. It could double check that the shape/strides/dtype match the original data used to create the plan. If those arguments are not specified, it would use the stored values as normal.

Ah, I see, you want to iterate over slices while keeping the plan. Adding optional in/out arrays as replacements for the stored ones for the high-level enqueue is a possibility, ok for me. Ultimately, as I understood, all this is needed to allow for an extended batching scheme with more than one non-transformed axes. In the long term the high-level enqueue should take care of this, perhaps it is better to just add an „enqueue_transform_with_arrays_given“ method as a step towards this. Gregor

…

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <#29 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA7Gs-aTiX3stGVhrn1SXR8S-esT8kfqks5rV2hfgaJpZM4LsuZ8>.

SyamGadde · 2017-01-25T18:04:42Z

I agree that automatically batching across non-transformed axes would be great to integrate into gpyfft. I've created a somewhat generic mechanism to do so outside of gpyfft, but I'm afraid it's not very elegant -- perhaps it's because I haven't yet stumbled on the "trivial" solution. Anyway, when I clean it up, I'll be happy to share it.

There may still be other reasons to re-use a plan created through the high-level API, so I'll create a separate enqueue function (that takes arrays), as you suggest, and test it.

…yopencl buffer slicing -- much simpler now.

SyamGadde · 2017-01-25T18:45:45Z

Latest commits limit the entirety of the changes to fft.py. I created a new function enqueue_arrays which take data and (optional) result arrays to override the stored data/result arrays. enqueue just calls enqueue_arrays with the stored arrays since otherwise they just do the same thing.

I also found a pyopencl function that creates the sub-buffer for me so no need to call out to the OpenCL library directly.

The responsibility for automatic non-transformed axis batching would need to be shared between __init__ and enqueue_arrays because __init__ creates the clFFT plan, and you need to know how you will batch the array before creating a plan.

Here is an example of usage:

import gpyfft
import numpy
import numpy.fft
import pyopencl
import pyopencl.array
context = pyopencl.create_some_context()
queue = pyopencl.CommandQueue(context)
import numpy
a = numpy.zeros([1024, 256, 13], order='F', dtype=numpy.complex64)
a[:,:,:] = numpy.arange(256)[numpy.newaxis, :, numpy.newaxis]
a_gpu = pyopencl.array.to_device(queue, a)
b_gpu = pyopencl.array.zeros_like(a_gpu)
b2_gpu = pyopencl.array.zeros_like(a_gpu)

# notice here we use a sliced array which has a non-zero offset
transform = gpyfft.fft.FFT(context, queue, a_gpu[:,:,1], axes = (1,), out_array=b_gpu[:,:,1])

# here we transform using the high-level API (the non-zero offset would have failed here before these commits)
(event,) = transform.enqueue()
event.wait()
result1 = b_gpu.get()[:,:,1]
assert numpy.allclose(result1, numpy.fft.fftn(a[:,:,1], axes=(1,)))

# here we transform all 13 chunks
# (all but the first chunk have non-zero offsets)
events = [
    transform.enqueue_arrays(data=a_gpu[:,:,i], result=b2_gpu[:,:,i])[0]
    for i in range(13)
]
for event in events:
    event.wait()
result2 = b2_gpu.get()
assert numpy.allclose(result2, numpy.fft.fftn(a, axes=(1,)))

geggo · 2017-01-26T08:22:16Z

Excellent, thanks for your contribution!

* 'master' of https://github.com/geggo/gpyfft: Merge PR #29: Accept pyopencl arrays with nonzero offsets (PR #29), add enqueue_arrays method Allow buffers to be of PyOpenCL-type PooledBuffer (in addition to the standard Buffer) Don't send an empty event array (clFFT thinks the sky is falling)

Syam Gadde added 7 commits January 24, 2017 11:18

Allow user to send buffer offsets to enqueue_transform (in which case…

e15c09b

… we create sub-buffers, and release them when done).

Don't create sub-buffer unless necessary, and mask out flags that don…

bf9b1b5

…'t make sense for sub-buffer.

Last commit required this change too.

7fdb934

Fix for releasing sub-buffers.

7af2bad

Allow arrays with non-zero offsets as inputs to enqueue().

9964dd2

Remove a now unused function.

98ff687

Fix indentation.

5b56301

Syam Gadde added 4 commits January 25, 2017 13:21

Move sub-buffer creation into fft.py and take advantage of built-in p…

b7d3609

…yopencl buffer slicing -- much simpler now.

Remove now unnecessary .pxd changes.

1a2afb6

Reorder arguments.

f01cdf8

Remove commented code.

a3d7bb8

geggo merged commit 7a8bc41 into geggo:master Jan 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Give user the ability to specify buffer offsets in enqueue_transform #29

Give user the ability to specify buffer offsets in enqueue_transform #29

Uh oh!

SyamGadde commented Jan 24, 2017

Uh oh!

geggo commented Jan 25, 2017

Uh oh!

SyamGadde commented Jan 25, 2017

Uh oh!

SyamGadde commented Jan 25, 2017

Uh oh!

geggo commented Jan 25, 2017 via email

Uh oh!

SyamGadde commented Jan 25, 2017

Uh oh!

SyamGadde commented Jan 25, 2017

Uh oh!

geggo commented Jan 26, 2017

Uh oh!

Uh oh!

Give user the ability to specify buffer offsets in enqueue_transform #29

Give user the ability to specify buffer offsets in enqueue_transform #29

Uh oh!

Conversation

SyamGadde commented Jan 24, 2017

Uh oh!

geggo commented Jan 25, 2017

Uh oh!

SyamGadde commented Jan 25, 2017

Uh oh!

SyamGadde commented Jan 25, 2017

Uh oh!

geggo commented Jan 25, 2017 via email

Uh oh!

SyamGadde commented Jan 25, 2017

Uh oh!

SyamGadde commented Jan 25, 2017

Uh oh!

geggo commented Jan 26, 2017

Uh oh!

Uh oh!