Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Array iterator classes private to hide implementation from client #193

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Changes from 4 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
e6691a1
Initial creation
pmetras Jan 9, 2022
84535ad
Remove RFC number
pmetras Jan 9, 2022
42b09a7
Update RFC to apply API change to all collection classes
pmetras Jan 19, 2022
5f5d79f
Line break on column 80
pmetras Jan 19, 2022
a52eb5f
List signature changes in collections
pmetras Jan 19, 2022
73099ee
https://github.com/ponylang/rfcs/pull/193#discussion_r787306884
pmetras Jan 19, 2022
873c378
https://github.com/ponylang/rfcs/pull/193#discussion_r787309484
pmetras Jan 19, 2022
b0fd354
https://github.com/ponylang/rfcs/pull/193#discussion_r787298485
pmetras Jan 19, 2022
f492370
https://github.com/ponylang/rfcs/pull/193#discussion_r787310073
pmetras Jan 19, 2022
62a7b03
https://github.com/ponylang/rfcs/pull/193#discussion_r787311214
pmetras Jan 19, 2022
48fc8bd
https://github.com/ponylang/rfcs/pull/193#discussion_r787302750
pmetras Jan 19, 2022
255c487
https://github.com/ponylang/rfcs/pull/193#discussion_r787303413
pmetras Jan 19, 2022
869eb82
Include changes from Github comments
pmetras Feb 10, 2022
2bfd088
Merge branch 'ponylang:main' into rfc-array
pmetras Feb 10, 2022
7bf7641
Generate HTML
pmetras Feb 23, 2022
06018fc
Merge branch 'rfc-array' of https://github.com/pmetras/rfcs into rfc-…
pmetras Feb 23, 2022
e60bb5a
Move to docs
pmetras Feb 23, 2022
16ec9ef
Set theme jekyll-theme-slate
pmetras Feb 23, 2022
5b91ce2
Merge branch 'ponylang:main' into rfc-array
pmetras Feb 24, 2022
7fda29d
Rename to index.html
pmetras Feb 24, 2022
4363ab0
Merge branch 'rfc-array' of https://github.com/pmetras/rfcs into rfc-…
pmetras Feb 24, 2022
5dbe1ca
Rename to index.html
pmetras Feb 24, 2022
68b7695
With Front Matter
pmetras Feb 24, 2022
c0da56d
Added Front Matter
pmetras Feb 24, 2022
b539018
Change layout to default
pmetras Feb 24, 2022
44103dd
Improve Front Matter
pmetras Feb 24, 2022
d61d656
Add rendered page and updates
pmetras Feb 24, 2022
18f8d49
Use Github rendering instead of pandoc
pmetras Feb 24, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 198 additions & 0 deletions text/0000-array-private-classes
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
- Feature Name: array_private_classes
- Start Date: 2022-01-08
- RFC PR: (leave this empty)
- Pony Issue: (leave this empty)

# Summary

Collection classes should not expose internal classes through iterators functions.

Following information hiding design principle, the builtin classes of collection
data structures must not be made visible through iterator functions like `ArrayKeys`,
`ArrayValues` and `ArrayPairs`. These classes can be made private as they are
only used as return types for `Array` functions `keys`, `values` and `pairs`.
The return values for these functions are changed to the more general interface
`Iterator`.

A new interface `RewindableIterator` is defined to allow for rewindable iterators,
like it is the case for `Array` `values`.

This design principle is applied to the other collection classes that expose
internals too like:

* `List`
* `Map`
* persistent `Map`
* `Vec`
* persistent `Vec`
* `Set`
* `Itertools`

# Motivation

This change brings:

- Applying the design principle of
[hiding implementation details](https://en.wikipedia.org/wiki/Information_hiding)
but offer a general and stable interface. Returning interfaces instead of concrete
classes allows changing the implementation. Usually, one must return the most
general type that fullfils the contract of the function (in the case of the
functions discussed in this RFC, iteration).
- Collections' functions `keys`, `values` and `pairs` definitions are made more
general. Iterators implementation details are not public. Internal classes used
by implementation like `*Keys`, `*Values` and `*Pairs` are now
[opaque data types](https://en.wikipedia.org/wiki/Opaque_data_type). Generally,
when using these collection classes, clients are not interested by the iterators
implementation, but by the types these iterators return and that is provided by
the generic parameters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Motivation can be used a precedence for future RFCs so I want this clarified to state that we are not setting a precedence to generally avoid returning concrete types in the stdlib -- which would be a mistake in my opinion. This reasoning to use opaque types due to client disinterest does not extend beyond this RFC -- specifically we are stating that we have those expectations of disinterest for iterator implementations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The objective of the RFC is not about concrete type vs virtual ones, but more about applying Liskov principle. And it's difficult to say that a principle applies only on a special RFC... My initial questioning was about only about Array iterators and now it extends to all collection iterators, where we have to define what collection means (because collection data structures are not limited to the collection package).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote against the RFC on those grounds. If you want to include it in the motivation you can, but I won't be in support in that case.

- The generic return signature of these 3 iterating functions is simpler to
understand for clients of collection classes.
- Makes the standard library more simple by hiding 18 specialised classes from
stdlib of which 3 are from `builtin`.
- The interface `RewindableIterator` is added to create rewindable iterators
(can be re-start from first value).

This change remains compatible with the existing code base but for client code
that is directly using the classes `*Keys`, `*Values` and `*Pairs`. A search on
Github shows that the impact is very limited.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be motivation and should be worked into design as these are notes and statements about a some specific design points.


# Detailed design

Iterating functions in collections `keys`, `values` and `pairs` are changed to
return `Iterator` and the classes that implement these iterators are made private.
Here are the full implementation of these functions for the `Array` class (changes
in other collection classes are identical).

```pony
fun keys(): Iterator[USize]^ =>
"""
Return an iterator over the indices in the array.
"""
_ArrayKeys[A, this->Array[A]](this)

fun values(): RewindableIterator[this->A]^ =>
"""
Return an iterator over the values in the array.
"""
_ArrayValues[A, this->Array[A]](this)

fun pairs(): Iterator[(USize, this->A)]^ =>
"""
Return an iterator over the (index, value) pairs in the array.
"""
_ArrayPairs[A, this->Array[A]](this)
```

As the function `values` of class `Array` uses an iterator with a `rewind` function
that is not part of the `Iterator` interface, a new interface `RewindableIterator`
is added to enable creation of rewindable iterators.

Note: To remain consistent with `Array` behaviour, functions `keys` and `pairs`
should return a `RewindableIterator` too but we limited the API change to minimum
as we did not understood why it was not already the case.

```pony
interface RewindableIterator[A] is Iterator[A]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for is Iterator[A] as far as I can tell here. Is this an attempt to make sure in the future that RewindableIterator doesn't diverge at all from Iterator?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true for interfacebut I just kept with the coding style of the stdlib where numerous classes (i.e. Array, String, List, Range, Reverse, HashSet, etc.) explicitly name the interface they implement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the case of those, it made sure that if the Iterator interface was changed that all those concrete classes would fail compilation, is that the intention here? If yes, I think there is a larger discussion about the idea of duplicating the methods from Iterator in RewindableIterator and then using is to enforce the Iterator interface.

It feels odd. It might be worth it to avoid having an interface that is only rewind and the returning the two interfaces, but, it does feel odd and I think the reasoning is worth discussion.

I know you had some conversation around this with @jemc. I think you should discuss with him some supporting justification for this. Is this a one off? Is this establishing a pattern that we want to use in the standard library? Is it a pattern that is already being used in the standard library?

Copy link
Member

@jemc jemc Jan 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion would be to keep the is Iterator[A] part and remove the explicit next and has_next definitions.

For an interface, that roughly reads as:

  • "To implement RewindableIterator[A], you need to implement everything Iterator[A] requires (which doesn't need to be repeated here), but you additionally need to implement this rewind method."

Or from the perspective of the interface user:

  • "If you get a RewindableIterator[A], it can do everything Iterator[A] can do (which doesn't need to be repeated here), but you additionally can use it to rewind"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted below by me, I prefer the alternative design of not having RewindableIterator[A] overlap with Iterator[A] and instead have Iterator[A] & Rewindable[A] define RewindableIterator[A].

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what is finally desired here. What would be the RewindableIterator (or CollectionIterator) definition so that I can include it in the RFC text?

"""
A `RewindableIterator` is an iterator that can be rewinded, that is start
again from first item. The data structure being iterated on can't change the
order it return iterated items.
"""
fun has_next(): Bool
"""
Return `true` when function `next` can be called to get next iteration item.
"""

fun ref next(): A ?
"""
Return the next item of the iteration or an error in case there are no other
items. A previous call to `has_next` check if we can continue iteration.
"""

fun ref rewind(): Iterator[A]^
"""
Get a new iterator that can be used to start the iteration again from the
first item.
"""
```

The code of the standard library is adapted to remove use of these now private
classes, mainly in tests. Here are the files that must be changed:

* `packages/builtin/array.pony` as shown above
* `packages/itertools/iter.pony` in function `cycle`
* `packages/collections/heap.pony` in function `values`
* `packages/collection/builtin/_test.pony` in class `_TestArrayValuesRewind`
* `packages/collections/list.pony`
* `packages/collections/map.pony`
* `packages/collections/persistent/map.pony`
* `packages/collections/persistent/vec.pony`
* `packages/collections/set.pony`
* `test/libponyc/util.cc` to change the name of the class to `_ArrayValues`

# How We Teach This
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in this section seems to be about teaching people anything going forward about the pattern nor explaining to people impacted by the breaking change what they need to be aware of. This section should be devoted to what if anything should be done with Pony Patterns, the Pony tutorial, standard library documentation, and release notes to adapt to the new world this RFC would bring about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmetras - "How We Teach This" could include a post on the "Pony Patterns" site that demonstrates how to use a public interface to hide the implementation of a private class, rather than making that class public.

Like you, I see this as a desirable design pattern, so we could have a "Pony Patterns" example showing how we did this in the Pony standard library with Iterators (assuming we can get the RFC approved).


This change keeps the code compatible in the vast majority of cases. When client
classes are defining objects of these now private types, the reason is usually
to get access to the function `rewind` that was not defined in `Iterator`. By
adding the interface `RewindableIterator`, client code can easily be adapted,
replacing `ArrayValues[A]` by `RewindableIterator[A]`.

Also, client code generally uses these functions to iterate on the returned types
and does not try to access the iterator directly but is interested by the iterated
items. When client code refers to the iterator type, that's generally useless and
the code can be rewritten to be made shorter and more future proof.

A [search on Github Pony code](https://github.com/search?q=%22ArrayValues%22+language%3APony&type=code)
finds 24 files using the class `ArrayValues`, of which 6 are copies of `array.pony` file.

For instance, in
[xml2xpath.pony](https://github.com/redvers/pony-libxml2/blob/bbca5d98d48854bfec2c6ee110220873ecc4df34/pony-libxml2/xml2xpath.pony#L41),
the code can be changed from

```pony
fun values(): ArrayValues[Xml2node, this->Array[Xml2node]]^ ? =>
if (allocated) then
ArrayValues[Xml2node, this->Array[Xml2node]](nodearray)
else
error
end
```

to

```pony
fun values(): RewindableIterator[Xml2node]^ ? =>
if (allocated) then
nodearray.values()
else
error
end
```

In this sample, the developer was not really concerned by the type of the iterator
but that the `values` function must return an `RewindableIterator` over `Xml2node`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think you can speak on behalf Red's concerns here; unless I am missing some prior conversation where Red stated this directly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it was only an example I found while surveying how existing code were using the concrete class on Github and that I used to demonstrate the type of impact in the RFC. I saw that in the few examples I found, these concrete class were not used for their implementation, but they were used as iterators only. In the example, I showed how an existing piece of code could be rewritten using the new functions signatures while still giving the same results. And as a consequence, the code will be simpler to understand.

The new version makes the code simpler to understand.

This change in `array.pony` and other collections will break such code but it
can be easily adapted to use the new API. And it will make the standard library
easier to learn by reducing the number of public types.

# How We Test This

Pony tests must continue to pass.

# Drawbacks

As said, some client's code must me adapted when using these classes. As these
classes are just concreted implementations of `Iterator` by collection classes,
their use in client code is limited and the code can very easily be changed.

# Alternatives

Stay as is. Continue the
[discussion on Zulip](https://ponylang.zulipchat.com/#narrow/stream/189959-RFCs/topic/Make.20Array.20iterators.20private).

# Unresolved questions

None