-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Array Constructors - similar
#130
Comments
Memory management is pretty central to performance, so it'd be great if we can find a way to make optimizations easy. As a brief aside,
mutable struct MemoryBuffer{L,T} <: DenseVector{T}
data::NTuple{L,T}
@inline function MemoryBuffer{L,T}(::UndefInitializer) where {L,T}
@assert isbitstype(T) "Memory buffers must point to bits types, but `isbitstype($T) == false`."
new{L,T}()
end
end Which Julia's compiler will stack allocate if it doesn't escape. The memory field could be anything, so long as it's memory we can get a pointer to, and preferably plays well with Julia's compiler and GC. Something that is needed here however is the ability to extract the memory. Hence, to expressly support stack allocation of statically sized mutables, we need to make extracting the minimal memory object a part of the API. Something else that I'd really like, but would need to actually spend some time with AbstractInterpreter or other tools before seeing what it'd take to make an API convenient, is the ability to swap out memory allocators contextually. For example, maybe we have hot code where we know arrays don't escape. But perhaps arrays are fairly large, and/or passed to lots of functions we don't want to const MYSTACK = Libc.malloc(1 << 30); # 1 GiB
function myfunction_using_custom_stack(args...)
mystack = MYSTACK
# every allocation in this function and functions it calls uses `mystack` and then increments by the amount of memory used
...
end this would make writing "in place" code much easier. I don't really know how to make this easy. Would also be nice to finally get |
I was trying to solve this with
This is actually the big thing preventing me from getting generic I've been playing around with a generic set of buffers based off of code like |
Yeah, we could define specific overloads for @inline preserve_buffer(A::AbstractArray) = A
@inline preserve_buffer(A::SubArray) = preserve_buffer(parent(A))
@inline preserve_buffer(A::PermutedDimsArray) = preserve_buffer(parent(A))
@inline preserve_buffer(A::Union{LinearAlgebra.Transpose,LinearAlgebra.Adjoint}) = preserve_buffer(parent(A))
@inline preserve_buffer(x) = x Which a few other libraries including
I've always been split on these two approaches: A # ::SArray
rA = Ref(A);
GC.@preserve rA begin
p = Base.unsafe_convert(Ptr{eltype(A)}, rA)
# can build a `VectorizationBase.StridedPointer` from this
end A # ::SArray
struct PretendPointer{....all the parameters from ArrayInterface....} <: VectorizationBase.AbstractStridedPointer
data::D
offset::Int
end
# can add methods to support the interface, e.g. `vload` mapping to `getindex` "2." has the advantage of being much easier to implement, in particular because it can be ignorant of all the details about what the data it's wrapping is, and doesn't need me to change any of the current code calling "1." has the advantage of probably being much better, but the compiler does a bad job of optimizing both of these. I'd think "1." would be easier to optimize, but it for some reason really likes to actually copy sp, b = stridedpointer_and_preserve(A)
GC.@preserve b begin
...
end where if sp = stridedpointer(A)
b = preserve_buffer(A)
GC.@preserve b begin
...
end won't work, since we need the I think this is fine, but something to keep in mind. Besides getting it so I can load from
Yeah, I copy pasted the above from Octavian.jl. StrideArrays.jl itself just has: using Octavian
using Octavian: MemoryBuffer Octavian uses it raw to allocate memory on the stack, without actually wrapping it in an array. We could probably look to
I'm not sure either.
But I think we should also be cautious about turning ArrayInterface into a kitchen sink. We already have one issue about it being too heavy to take on as a dependency. On non- |
I think we could make something agnostic to just using approach 1 or 2 so that when someone wants to go crazy with optimization they can dig in. I was thinking the first level of interaction for a lot of users would be calling instantiate(A) do
...do stuff to a pointer/pseudo-pointer...
end Internally buffer allocation and pointer constructors would just have sensible defaults that can then be changed. I'm still thinking through how to make it both composable and flexible though. I'll hack out some more ideas. It might be worth having something like |
For a first step to start experimenting with it and checking performance, it'd be nice to have a way to distinguish between when we should take approach 1 vs 2, which is what #131 is meant to indicate. What is function rotr90(A)
ind1, ind2 = axes(A)
return instantiate(A, x -> allocate_memory(x, (ind2, ind1))) do a, b
m = first(ind1)+last(ind1)
for i=ind1, j=axes(A,2)
b[j,m-i] = a[i,j]
end
end
end Here it looks like
I do think a combination of
covers most use cases. Should we ping anyone from |
TBH, function instantiate(f::Function, allocator::Function, preserver::Function, initializer::Function, args...)
buffer = allocator()
preserver(f, buffer, args...)
return initializer(buffer)
end So buffer_ptr = maybe_pointer(buffer)
data_ptr = maybe_pointer(data)
GC.@preserve buffer, data begin
f(buffer_ptr, data_ptr)
end And I still need to put more thought into how we get from |
It doesn't create a generator like rotr90(x) = instantiate(_rotr90, x)
_rotr90(x_ptr) = ... I know it seems a bit silly to focus on avoiding that extra line of could with |
If we can get a size and eltype
I really would encourage people to write in-place functions though, and then wrap the in-place version to make the convenient-but-allocating version. |
I asked @maleadt on slack about memory allocation on GPUs and he pointed me to the code here:
Concerning garbage collection he said "...we use Julia's GC and in kernels we currently don't really support dynamic allocations." Which makes me think that we could rely on the
That's exactly what I was thinking. I don't think it would be unreasonable to expect users to do something like |
I think we are in a good place to start working on array constructors (would also make documenting examples a whole lot easier). This brings up stuff related to
similar
:I'm not convinced there's a silver bullet for array constructors, but I think we could at least find a solution to what
similar
is often trying to do, allow generic method definition without worrying about array specific constructors. I think we actually need to break up what similar does into several pieces thoughI haven't worked out all the details but here's some of the cleaner code I have so far that might support this.
The reason I think separating things out like this is helpful is because it turns this function from base like this
into this
This means that new array types typically wouldn't need to change
rotr90
but would just change their allocators and initializers. I'm not super familiar with how jagged arrays work but we could haveallocate_memory
take in device and memory layout info for this.The text was updated successfully, but these errors were encountered: