-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid repeating the calculation, retain the results #23
Comments
A calculation that takes 1/3 of a second to complete is not typical of how this package is used. Just curious, have you benchmarked this suggestion for general usage? I highly recommend doing so 😄. |
(On modern CPUs the time to fetch from "stale" memory (not in L1 cache) is equivalent to hundreds of flops. Almost never worth caching anything unless it's a reeeally slow calculation.) |
No, I just played around with it. OK, so really, the only situation where a "senile" mappedarray (i.e. as they are now) would be less performant than a regular array is: if we access the same cell
As I understand it (now), neither the function's speed nor the array's size affect the question of using mappedarrays versus regular ones. |
Just want to make sure, you realize you can call
Well, if you're accessing the majority of elements repeatedly and the calculation is slow, you're better off realizing it by just creating a new array the first time ( For example: julia> using MappedArrays
julia> a = rand(10^6);
julia> m = mappedarray(sqrt, a);
julia> v = map(sqrt, a);
julia> using BenchmarkTools
julia> @btime sum($m)
2.538 ms (0 allocations: 0 bytes)
666744.570730794
julia> @btime sum($v)
416.189 μs (0 allocations: 0 bytes)
666744.570730794
julia> @btime map($sqrt, $a);
2.583 ms (3 allocations: 7.63 MiB)
julia> @btime mappedarray($sqrt, $a);
22.613 ns (1 allocation: 16 bytes) You can see that |
Thank you for the explanation, I think I got it. But I still think my statement
and that
holds. If we Background: I'm using OnlineStats to fit a distribution to each of the pixels in an image time lapse of a beetle walking in a dark scene. This will give me an estimate of the noise level in the background. I then use mappedarrays to convert each fitted distribution to a noise level (mean + 2*std). I don't have to know the noise level for each and every one of the pixels, just the ones I think the beetle might have crossed (I'm tracking the beetle). These are few n number. I probably need to revisit some pixels more than once. This lead me to think about the cost of rerunning the noise calculation. But I'm sure the number of times I need to access the same index is far far far lower than the number of pixels I don't check, so it's worth it. |
Thanks for the explanation. So you're in a situation where a few get accessed many times and the cost is relatively high. That's an important use case, but it's not really the bread and butter of the types in this package so far, which in practice get used mostly for fairly trivial calculations ( e.g., eltype changes, bringing 3 grayscale arrays into a single RGB array, etc). It's not possible to retain good performance for trivial calculations while also memoizing: you have to choose one or the other. It may be worth your while to develop a type that's optimized for your situation, e.g. struct MemoizedMappedArray{T,N,F,A<:AbstractArray} <: AbstractArray{T,N}
f::F
parentarray::A
memoizedvals::OffsetArray{Union{T,Missing},N}
end where |
I think this is the best solution. It's nicely modular. |
It is, but Of course, this is getting to be quite a niche case, so you're right that one would need to think carefully about whether it's worth it. |
Memoize in theory supports arbitrary @memoize MyDict f(x) = x+2 and it would be straight-forward to do a vector-based AbstractDict. Unfortunately, it's broken, and the package is semi-abandonned. I'm trying to push for this functionality in Anamnesis.jl instead. |
I feel like I've convinced myself that neither the size of the array nor the speed of the function matter. It's just the number of repeated/extra times we access pre-accessed cells compared to the number of never-before accessed cells. If the former exceeds the latter than we aught to have used a regular array. This does not describe my current case. I'll keep benchmarking it, but I think that the modular solution @cstjean suggests with Anamnesis.jl will be dandy if and when it comes to play. |
It would be really cool if the calculated result of some cell in an array was somehow saved for repeated access:
The rational behind this is that with normal arrays you pay computation time for the map for ALL the cells (while you really might not need all of them), but that's a one-time cost, repeated access is cheap. With mapped arrays you pay the calculation price only for the cells you need, winning, but if you need to access them repeatedly then this gain in time can be quickly lost.
Note that I think I can get this by
using Memoize
and@memoize
ing the mapped function...The text was updated successfully, but these errors were encountered: