⚡️ Speed up method WithFixedSizeCache.add_model
by 50% in PR #1373 (feat/pass-countinference-to-serverless-getweights
)
#1385
+58
−33
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1373
If you approve this dependent PR, these changes will be merged into the original PR branch
feat/pass-countinference-to-serverless-getweights
.📄 50% (0.50x) speedup for
WithFixedSizeCache.add_model
ininference/core/managers/decorators/fixed_size_cache.py
⏱️ Runtime :
1.08 seconds
→722 milliseconds
(best of12
runs)📝 Explanation and details
Here's an optimized rewrite of your program, addressing profiling hot spots and general efficiency improvements.
Optimization Summary:
WithFixedSizeCache.add_model
, avoid repeatedself._key_queue.remove(queue_id)
by checking position or maintaining a set for fast checks (no need, since only called if known present, and block is rare). Still, code can be reduced for clarity.gc.collect()
call after, instead of per-iteration.Key Runtime Optimizations:
gc.collect()
after all removals in a batch, not after every single model eviction._resolve_queue_id
for this use case.Performance Note:
If you profile again after these changes, most of the time will now be in actual model loading and removal. That is, this code will not be a noticeable bottleneck anymore in the workflow. If LRU cache size is much larger, consider further data structure optimizations such as a dict for constant-time eviction and presence checking, but for N ~ 8 this is not needed.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-pr1373-2025-06-24T21.57.17
and push.