wip: CachedUpdatelessFunction #28
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I tried to see if AllocArrays would work with Lux/Boltz, since it seems to be working for ObjectDetector. There the only issue I had was performance problems with mixed UnsafeArrays vs matrices leading to generic matmul instead of blas. This is kinda similar to the issue from JuliaLang/julia#57799 (though technically quite different) and I came up with a better workaround this time: we can adapt the function within the
with_allocator
block to homogenize the types. As long as adapting is fast relative to the runtime, it's kinda fine.However this means the function we evaluate does not persist, so it might not work well for ObjectDetector which updates the model struct even on the forward pass with its
W
parameter. So here I have the terrible nameCachedUpdatelessFunction
, whereCached
is meant to mean it reuses allocations (not values), Updateless means it won't persist updates to the model, and function for a generic callable (doesn't have to be an ML model).With this, I get
yielding
So we were able to eliminate GC and most allocations, though possibly with a small runtime cost. Maybe we are still not hitting some optimized path.
This feels a bit clunky/hard-to-describe currently, and moves Adapt from an extension to a regular dep, so not sure we want this but thought I'd put it up.