Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Iterable<T>.windows() to apply the sliding window algorithm on Iterables. #724

Open
vxern opened this issue Nov 23, 2024 · 5 comments
Open

Comments

@vxern
Copy link

vxern commented Nov 23, 2024

Example signature:

extension IterableWindows<T> on Iterable<T> {
  Iterable<List<T>> windows(int size) sync* {
    ...
  }
}

Given the following call:

final list = [1, 2, 3, 4, 5];
final size = 2;

print(list.windows(size));

windows() produces a list of slices, where every slice is of size size, and where every consecutive slice begins one element later:

[[1, 2], [2, 3], [3, 4], [4, 5]]

Notes:

  • The number of produced windows is equal to list.length - (size - 1).
    • So, given a list of length 4 and window size 2, there would be 3 windows.

Constraints:

  • size must be a positive, non-zero integer.
  • size must not exceed the length of list.

If accepted, I'd be happy to implement this myself!

@vxern vxern changed the title Add an helper for getting sliding windows on collections. Add an extension for getting sliding windows on collections. Nov 23, 2024
@vxern
Copy link
Author

vxern commented Nov 23, 2024

A sample implementation could look something like this:

extension IterableExtension<T> on Iterable<T> {
  Iterable<List<T>> windows(int size) sync* {
    if (size <= 0) {
      throw StateError('The size must be a positive, non-zero integer.');
    }
    
    if (size > length) {
      throw StateError('The size must not exceed the number of elements.');
    }
  
    final iterator = this.iterator;
    final window = <T>[];
    while (iterator.moveNext()) {
      window.add(iterator.current);
      if (window.length != size) {
        continue;
      }
      
      yield [...window];
      
      window.removeAt(0);
    }
  }
}

@lrhn
Copy link
Member

lrhn commented Nov 23, 2024

Seems fairly inefficient.
If it cannot be optimized then I don't see much reason to have it as a library function.

So, can it be optimized?

If we assume that each element must be a persistent shareable list, that shouldn't keep any other values alive than the ones on that list, then we do need to create one new list per element.

Inside the iteration, it only needs to remember the n-1 last elements. Those can be stored in a cyclic buffer, rather than using .remove(0), so there is no extra copying other than into the new list.
Something like:

extension<T> on Iterable<T> {
  Iterable<List<T>> window(int size) sync* {
    if (size < 1) throw RangeError.range(size, 1, null, "size");
    List<T>? buffer; // Cyclic buffer.
    var cursor = -size;
    for (var element in this) {
      if (cursor >= 0) {
        buffer![cursor] = element;
        var cursorAfter = cursor + 1;
        yield List<T>.filled(size, element)
          ..setRange(0, size - cursorAfter, buffer, cursorAfter)
          ..setRange(size - cursorAfter, size - 1, buffer);
        cursor = cursorAfter;
        if (cursor >= size) cursor = 0;
      } else {
        (buffer ??= List<T>.filled(size, element))[size + cursor] = element;
        if (++cursor == 0) yield buffer.toList(growable: false);
      }
    }
  }
}

Possible. It's still somewhat overkill to create a new list for each element if it doesn't have to survive past a synchronous computation.
If every step instead provided the same Queue, and the moveNext removed one element and added another, then it would be more efficient.
The queue is only guaranteed to be correct until the next call to moveNext
If the user needs a list, they can call toList on the queue.
If we only expose the queue as an Iterable, they can't mess with it either.

I think that would be a better API. More error prone, if someone holds on to an element after calling moveNext, but less unnecessary overhead of you don't need it.

import "dart:collection" show ListQueue;
extension <T> on Iterable<T> {
  Iterable<Iterable<T>> windows(int size) sync* {
    if (size < 1) throw RangeError.value(size, 1, null, "size");
    var queue = ListQueue<T>(size + 1); 
    var count = 0;
    for(var e in this) {
      if (count == size) {
        queue..removeFirst()..add(e);
        yield IterableView(queue);
      } else {
        queue.add(e);
        count++;
        if (count == size) yield IterableView(queue);
      }
    }
  }
}

(Where IterableView is just something that hides the Queue API. Or it could be made so that the object can be disabled when you call moveNext again.)

That's what I would want from a function like this.

@vxern
Copy link
Author

vxern commented Nov 23, 2024

I only gave a sample implementation, so naturally it wouldn't be quite there when it comes to performance. I was actually hoping I'd get some feedback about how it could be implemented in reality. Though, ultimately, I was rather more curious about what the sentiment would be on adding such a function, since the actual implementation details could be worked out once decided that 'sure, this is useful enough, let's think to add it'.

Regarding your implementation, where does IterableView come from?

@lrhn
Copy link
Member

lrhn commented Nov 23, 2024

The IterableView is just some hypothetical wrapper that hides the Queue API and only exposes the Iterable API.
It could probably be DelegatingIterable from package:collection.

My point here is that it can be optimized (that's good, otherwise I'd be more worried), but also that even the optimized version that returns a new List for each element is more costly than what I'd prefer.
It's not doing allocation per element that worries me, as much as copying size elements. If that's not necessary, then it's a non-constant unnecessary overhead. If it is necessary, doing .map((i) => i.toList()) would make that list for you.

(I would probably make a wrapper that can also be disabled, something like:

class TemporaryIterable<T> extends Iterable<T> {
  Iterable<T>? _source;
  Iterable<T> get _checkSource => _source ?? (throw StateError("No longer valid"));
  TemporaryIterable(Iterable<T> this._source);
  void _dispose() { _source = null; }
  int get length => _checkSource.length;
  Iterator<T> get iterator => _TemporaryIterator<T>(this, _checkSource.iterator);
  // every iteratable member forwards to `_checkSource.member`.
}
class _TemporaryIterator<T> implements Iterator<T> {
  final _TemporaryIterable<T> _source;
  final Iterator<T> _iterator;
  bool moveNext() {
    _source._checkSource;
    return _iterator.moveNext();
  }
  T get current {
    _source._checkSource;
    return _iterator.current;
  }
}

Then I'd do: var iterable = TemporaryIterable<T>(queue); yield iterable; iterable._dispose(); to make sure you can't keep using the old view after calling moveNext.

It can probably be specialized for List to just provide a lazy sublist-view of the range. But then it's probably not even worth using, if you already know it's a list.

@vxern vxern changed the title Add an extension for getting sliding windows on collections. Add Iterable<T>.windows() to apply the sliding window algorithm on Iterables. Nov 24, 2024
@lrhn
Copy link
Member

lrhn commented Feb 13, 2025

Maybe we should introduce some notion of a "state iterator" which does not guarantee that the value of current stays valid after calling moveNext.

Current iterators generally create a single stand-alone current value after each successful moveNext(), and you can use that value however you want.

A StateIterator, could be a marker interface, instead gives you a current value that is a view on some internal state of the StateIterator, and where you can't assume anything about the current view after you've called moveNext.

Then any iterator wrapper which needs to produce a non-volatile current value will need to make a copy of the current value of a StateIterator.
The problem with that is that "doing nothing" is the default, and if someone doesn't know to look for a StateIterator, which no current code does, they'll just do nothing, and maybe fail the first time they see a StateIterator.
Another issue is that something like Iterable.map could be given a function that returns a view on the source iterable's current, assuming that to be stable, and then that view becomes invalid when you move, but the iterable returned by Iterable.map is not a StateIterable in general. It's only unstable if the source iterable's iterator is, and if the convert function creates a view on the value. So the StateIteartor type will not be safe, you can't assume it's propagated correclty, and it's easy to just not look for it, so maybe it's not worth having as a concept.

So maybe an individual iterable can just say that its iterator's current is a view on mutable state that changes when you call moveNext, and no guarantees are made about what the value of prior reads will be.
(It can be a completely invalid object that throws if you try to use it, or it can be the new value, or something in a completely inconsistent state. Although the first option is the nicest, it also implies an extra overhead.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants