-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Give user the ability to specify buffer offsets in enqueue_transform #29
Conversation
… we create sub-buffers, and release them when done).
…'t make sense for sub-buffer.
Hi, After having a quick glance at the code I have some comments/questions:
|
Sure, I don't see why it can't be done all in the high-level API -- in that case the same thing could be done in enqueue() instead. I'll amend this in a bit... |
OK, one reason I changed the low-level enqueue_transform was because that was the only way I could re-use the baked plan for multiple arrays (or multiple slices of the same array). Would it be acceptable to extend the high-level enqueue() to take optional data/result keyword arguments? If specified, they would override the stored |
Am 25.01.2017 um 16:21 schrieb Syam Gadde ***@***.***>:
OK, one reason I changed the low-level enqueue_transform was because that was the only way I could re-use the baked plan for multiple arrays (or multiple slices of the same array). Would it be acceptable to extend the high-level enqueue() to take optional data/result keyword arguments? If specified, they would override the stored data and result attributes and allow users to call enqueue repeatedly. It could double check that the shape/strides/dtype match the original data used to create the plan. If those arguments are not specified, it would use the stored values as normal.
Ah, I see, you want to iterate over slices while keeping the plan. Adding optional in/out arrays as replacements for the stored ones for the high-level enqueue is a possibility, ok for me.
Ultimately, as I understood, all this is needed to allow for an extended batching scheme with more than one non-transformed axes. In the long term the high-level enqueue should take care of this, perhaps it is better to just add an „enqueue_transform_with_arrays_given“ method as a step towards this.
Gregor
… —
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#29 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA7Gs-aTiX3stGVhrn1SXR8S-esT8kfqks5rV2hfgaJpZM4LsuZ8>.
|
I agree that automatically batching across non-transformed axes would be great to integrate into gpyfft. I've created a somewhat generic mechanism to do so outside of gpyfft, but I'm afraid it's not very elegant -- perhaps it's because I haven't yet stumbled on the "trivial" solution. Anyway, when I clean it up, I'll be happy to share it. There may still be other reasons to re-use a plan created through the high-level API, so I'll create a separate enqueue function (that takes arrays), as you suggest, and test it. |
…yopencl buffer slicing -- much simpler now.
Latest commits limit the entirety of the changes to fft.py. I created a new function I also found a pyopencl function that creates the sub-buffer for me so no need to call out to the OpenCL library directly. The responsibility for automatic non-transformed axis batching would need to be shared between Here is an example of usage:
|
Excellent, thanks for your contribution! |
* 'master' of https://github.com/geggo/gpyfft: Merge PR #29: Accept pyopencl arrays with nonzero offsets (PR #29), add enqueue_arrays method Allow buffers to be of PyOpenCL-type PooledBuffer (in addition to the standard Buffer) Don't send an empty event array (clFFT thinks the sky is falling)
Sorry for the raft of pull requests -- these are all modifications I've been using successfully for a while and have found very useful, and would welcome your input.
This set of commits adds
in_offsets
andout_offsets
toenqueue_transform()
, which, if specified, triggers the creation of sub-buffers under the hood. For those arrays that include multiple non-transformed axes that can't be "collapsed" into a single axis, this change allows the user to slice the input array into appropriate chunks (using the standard slice operator) and simply sendx.base_data
andx.offset
as the inputs/outputs -- though only as long as the offset is a multiple ofCL_DEVICE_MEM_BASE_ADDR_ALIGN
. It also allows for transforming arrays that have non-zero offsets. For example, to partially address issue #10 without needing to create copies of the original array or create our own OpenCL sub-buffers: