Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equality Types #4198

Open
eernstg opened this issue Dec 6, 2024 · 2 comments
Open

Equality Types #4198

eernstg opened this issue Dec 6, 2024 · 2 comments
Labels
feature Proposed language feature that solves one or more problems

Comments

@eernstg
Copy link
Member

eernstg commented Dec 6, 2024

As a follow-up on Typed Equality Operator, this issue describes a take on this topic area that includes some non-breaking changes to the run-time semantics.

Existing equality

The current typing of operator == is perfectly suitable for the case where we wish to decide the equality of two arbitrary objects. In this situation the "universe of discourse" is the universe of all objects whatsoever. This case occurs when two objects with static type Object? are compared, but a case like 1 == true is also included. The crucial characteristic is that the given equality comparison is intended to occur in said universe.

Equality should be an equivalence relation, and it is a non-trivial task to write implementations of operator == such that the behavior models a relation which is both reflexive, symmetric, and transitive. This requires carefully written implementations of operator ==, but it is in principle doable. So we already have a pretty good approach to equality when the universe of discourse is 'all objects'.

We may also want to use a narrower universe of discourse. We could check statically that e1 == e2 will compare two objects whose equality is plausible because they are both known to have a certain common type. This means that the universe of discourse is only the objects that have this type. This helps us avoiding equality comparisons whose outcome is false in every case because the operands were written incorrectly by accident (that is, in the cases where that incorrect expression has an "unreasonably" different type).

A well-known way to take a step in this direction is to implement operator == such that it tests the run-time type of the operand:

class A {
  ...
  bool operator ==(Object other) {
    if (other is! A) return false;
    ... // Compare states.
  }
}

class B implements A {
  ...
  bool operator ==(Object other) {
    if (other is! B) return false;
    ... // Compare states.
  }
}

This is dangerous because it is asymmetric: myA == myB may be true, but myB == myA will be false, where myA has run-time type A and myB has run-time type B.

A symmetric approach can be achieved by testing the type more strictly: if (other.runtimeType != runtimeType) return false;.

This is dangerous in a couple of ways: Theoretically, any user-written class can implement runtimeType to return whatever it wants, in which case it gets tricky to maintain a well-understood model of equality. More realistically, it is a very restrictive approach, and it breaks encapsulation, because it prevents substitutable subtypes: If other is a subtype of the enclosing class/enum/etc then it cannot possibly be == to this object. For example:

import 'dart:math' as math;

abstract class Point {
  double get x;
  double get y;
  operator ==(other) => other is Point && x == other.x && y == other.y;
}

class CartesianPoint implements Point {
  final double x, y;
  CartesianPoint(this.x, this.y);
  get hashCode => ...;
}

class PolarPoint implements Point {
  final double magnitude;
  final double angle;
  PolarPoint(this.magnitude, this.angle);
  get hashCode => ...;
  get x => magnitude * math.cos(angle);
  get y => magnitude * math.sin(angle);
}

void main() {
  Point p1 = CartesianPoint(0, 0), p2 = PolarPoint(0, 0);
  print(p1 == p2);
}

In this case the CartesianPoint and PolarPoint classes are specializations and implementations of the same concept Point, and it would make sense to decide on equality at the type Point, such that it is possible for a CartesianPoint to be equal to a PolarPoint.

The scenario could also use private subtypes to play the role of PolarPoint and CartesianPoint, but I'd claim that both the private and the public scenario can be legitimate.

It's OK for equality to apply for objects of different types, and hence it's too strict if it is enforced that equality among such objects is guaranteed to be false. Hence, I'd claim that if (other.runtimeType != runtimeType) return false; should not be used in general. It is only acceptable in the case where we can actually justify that no substitutable subtypes should (and do!) exist.

The equality type of a type

Because of the difficulties around symmetry, we introduce the notion of the equality type of a type.

The equality type of all types that are expressible today is Object. This preserves the semantics of current code.

The equality type of a type which is introduced by a class/enum/mixin/mixin-class declaration can be specified explicitly as a clause on the declaration of operator ==:

class A {
  ...
  bool operator ==(A other) at A => ...;
}

It is a compile-time error if this does not have the specified equality type.

The semantics of e1 == e2 is now modified such that (1) null is treated the same as today, (2) if the value v1 of e1 and the value v2 of e2 have different equality types then the result is false, otherwise (3) operator == is invoked with v1 as the receiver and v2 as the argument and the result is the return value from this invocation.

Note that this means that it is statically sound to declare operator == to have the equality type as its parameter type. Also note that there is no need to worry about typing properties of other: We already know that it is some subtype of the equality type, and we also known that other has explicitly declared that it agrees on this type as the basis for the comparison. In particular, we don't need anything like if (other is! MyType) return false; any more.

Next, we could introduce an explicitly typed equality like e1 =<T>= e2 which would require that the static type of e1 and the static type of e2 is a subtype of T?. We could then interpret e1 == e2 to mean e1 =<T>= e2 where T is the the equality type of the non-null type corresponding to the static type of e1. For example p1 == p2 would be a compile-time error if the static type of p1 is CartesianPoint, and the static type of p2 is not a subtype of Point?. If you insist then you can compare p1 and 1 by specifying the type explicitly: p1 =<Object>= 1.

Performance implications

It is possible to implicitly generate the type test as the first statement in the body of operator == (low-level pseudo code: if (class.equalityType != other.class.equalityType) return false;), but in the cases where it is guaranteed that the equality type is different from Object then we could generate code to perform the check inlined at the call site. In this case we would invoke a "raw" version of the operator == which doesn't have this check.

It might then be beneficial to declare a special equality type for null:

class Null {
  ...
  bool operator ==(Object other) at Null => true;
}

No other non-bottom class is a subtype of Null, which implies that the current handling of null at an expression of the form e1 == e2 can be replaced by a comparison of their equality types: If exactly one of the operands is null then they will have different equality types and the result is immediately false. If both are non-null then we may immediately know that the result is false if they have different equality types, and otherwise we'll call operator == as usual. Finally, if both operands are null then we invoke Null.== and get the result true.

All in all, this means that we are replacing the null-handling logic of today by a test whether the two operands have the same equality type or not.

That might be faster than the approach we're using today, in particular because the implementation of operator == will no longer need to do anything like if (other is! MyType) then return false; and because we will avoid the method invocation entirely in the case where the result is false because the equality types are different.

Finally, data structures like sets could rely on equality types in order to speed up lookups and other operations: If a given set is partitioned into subsets with the same equality type then it is already known for a lookup that we only need to look into the subset with the same equality type as the object which is being looked up, if any.

@eernstg eernstg added the feature Proposed language feature that solves one or more problems label Dec 6, 2024
@tatumizer
Copy link

tatumizer commented Dec 6, 2024

I'm not sure I fully (or even partially) understand the idea, but based on the vibes, I wonder if the same could be achieved by

class A {
  @doNotOverride // should be enforced by the compiler
  bool operator ==(other) => ...;
}

Indeed, if both operands of a == b are subtypes of A, then the invocation will land on the above == implementation independent of the order, and now it's a matter of writing it correctly to ensure 3 properties of equivalence. Whether the implementation achieves this goal cannot be guaranteed by the compiler, but the proposed "at A" device doesn't seem to give such guarantees either.

This mechanism can statically detect some bugs when you explicitly compare two uncomparable objects using a == b, but most comparisons (probably) occur in maps, where the mechanism doesn't help.

@lrhn
Copy link
Member

lrhn commented Dec 7, 2024

This still requires cooperation between the classes. They need to agree on the equality type and it becomes a breaking change to change the equality type, fx by introducing a new superclass and using that as equality type. That would break a third-party subclass that used the existing class as equality type.

It still means that a class can only belong to one equality type.
If a ColorPoint wants to be able to be equal to a Point of the coordinates match, acting as a Point, but equal to a ColorPoint only if the color also matches, then it can't be both.
It's not safe to do that with only one equality, because it's not transitive.
But it means it's not possible (or at least easy) to have the concepts of "equal as points" and "equal as colored points" at the same time, and choose between them.

So could a class be allowed to have more than one equality? That would basically be introducing overloading, so

  bool operator==(Point other) as Point;
  bool operator==(ColorPoint other) as ColorPoint =>

would mean that point==colorPoint is allowed, they share an equality type, and colorPoint==point would choose the as Point equality of ColorPoint because that's their most specific (and only) shared equally type.

(Does it choose statistically or dynamically? If statically, are these really just static methods chosen based on both argument types? Like extension methods on a pair of values. If dynamically, how expensive would that be?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Proposed language feature that solves one or more problems
Projects
None yet
Development

No branches or pull requests

3 participants