-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Optimise utils.find and specialise utils.as_chunks #10351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Instead of doing |
Yeah, unfortunately it is. I think sticking with |
|
don't have a quiet local machine to benchmark on right now, but I suspect using itertools.islice will be better for as_chunks: import sys
from itertools import islice
def _chunk(iterable: Iterable[T], max_size: int) -> Iterable[list[T]]:
iterator = iter(iterable)
while batch := list(islice(iterator, n)):
yield batch |
Yes, it does seem like using def _chunk(iterator: Iterable[T], max_size: int) -> Iterator[List[T]]:
# Specialise iterables/iterators that can be sliced which is much faster
if isinstance(iterator, Sequence):
for i in range(0, len(iterator), max_size):
yield list(iterator[i : max_size + i])
else:
# Fallback to slower path
iterator = iter(iterator)
while batch := list(islice(iterator, max_size)):
yield batchThis newer version is around |
|
FYI walrus operator is banned in this codebase. |
|
oh alr, this should be the equivalent I believe? I don't use the walrus operator often iterator = iter(iterator)
while True:
batch = list(islice(iterator, max_size))
if not batch:
break
yield batch |
|
Yeah that works too. |
Summary
This PR attempts to optimise
utils.findandutils.as_chunks.You can read more here, but just to summarise:
utils.find, but it is still used a lot in dpy and user code, so speeding it up will still help overall if you know what I mean. Benchmarks on all versions (3.8 - 3.15) are around2% - 20%fasterAlso, for anyone seeing this, feel free to benchmark these changes and see if it matches. Benchmarks can be found here. It is greatly appreciated, thanks!
Note:
utlils.as_chunksimplementation changed, so use the new one in this PR when benchmarking.Checklist