Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax the spec rule about initializerExpression not deriving a function literal? #4163

Open
eernstg opened this issue Nov 18, 2024 · 8 comments
Labels
question Further information is requested

Comments

@eernstg
Copy link
Member

eernstg commented Nov 18, 2024

The discussion in dart-lang/sdk#54284 showed that the following declarations are accepted by the analyzer as well as the common front end (808fa4ca8b5fbc3d79ade02f607b8533c9714029):

class Foo1 {
  final Function a;
  final int b;
  Foo1()
      : a = <T>() {},
        b = 0;
}

class Foo2 {
  final Function a;
  Foo2() : a = <T>() => 0;
}

void main() {}

However, the language specification has a rule which says that <initializerExpression> does not derive a <functionExpression>. This should cause the above constructor declarations to be syntax errors.

It could be argued, though, that this is benign: The rule against deriving an initializer expression into a function expression is motivated by a readability issue. In particular, it is in some situations impossible to decide whether a given function body is the body of a function literal which is used to initialize an instance variable, or it's the body of the enclosing constructor declaration, until the very end of that body has been reached: If there's a semicolon then it was the function literal body, if there is no semicolon then it was the constructor body. That's a potentially large amount of code to scan, just to see if there is a semicolon (and nobody will do it unless they are syntax nerds ;-).

However, this ambiguity only exists because the formal parameter list may look like an expression (() could be a formal parameter list or the empty record, (x) could be a parameter list or a parenthesized expression, and (x, y) could be a parameter list or a record literal). This is not true when the function literal is generic: <T>(...) is not an expression (for now, at least ;-).

So we could update the spec to say that <initializerExpression> cannot derive a non-generic function expression.

This spec change would not give rise to any implementation effort because the tools already behave in this way.

@dart-lang/language-team, WDYT?

@lrhn
Copy link
Member

lrhn commented Nov 18, 2024

cannot derive a non-generic function expression.

That sounds like an even harder exception to remember than "initializer-list expressions can not be function literals".

@stereotype441
Copy link
Member

cannot derive a non-generic function expression.

That sounds like an even harder exception to remember than "initializer-list expressions can not be function literals".

Agreed. I'm guessing this isn't a common coding pattern, so assuming that's true, I would rather make a breaking change to the language that causes the above examples to become errors. I'd be glad to drive this through the breaking change process, and update the implementations, if there's general agreement that it's a good idea.

(If it turns out that I'm wrong, and it is a common coding pattern, then we will discover that when we try to push out the change, and we can re-evaluate.)

@eernstg
Copy link
Member Author

eernstg commented Nov 20, 2024

OK, I'm fine with any decision. However, let me reiterate a couple of points about this topic, just to make sure we have them on the table.

It could be argued that a function literal with a big body in an initializer list is really confusing even in the case where an initial <T> will technically disambiguate the situation:

class A {
  var noise1, noise2;
  Function f;

  A(): noise1 = "Some long thing", noise2 = [1, 2, 42], f = <X>(X x) {
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
  };
}

However, we will still be able to use this variant:

class A {
  var noise1, noise2;
  Function f;

  A(): noise1 = "Some long thing", noise2 = [1, 2, 42], f = (<X>(X x) {
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
  });
}

So how much of a solution do we actually have for this problem? Would it be sufficient to rely on the good taste of developers and just lift the restriction? (the good taste is needed for every line of code out there, anyway ;-).

@stereotype441
Copy link
Member

stereotype441 commented Nov 20, 2024

@eernstg For me personally it's more than just a readability issue. It's also a parser complexity issue.

When the parser is looking for an expression, and it sees an (, it has to decide, before it can consume any more tokens, whether to parse the text inside the parens as function parameters or expressions. Today, it can do that with a pretty simple heuristic:

  • If mayParseFunctionExpressions is false (meaning, roughly, that the parser is inside a constructor initializer list), then function literals aren't allowed here, so the text inside the parens should be parsed as expressions.
  • Otherwise, if the token following the matching ) is =>, {, or a keyword/identifier*, then the text inside the parens should be parsed as function parameters.
    • *The "keyword/identifier" case handles the tokens async and sync, as well as providing good error recovery in the case where the user has started typing async or sync but hasn't finished yet, e.g. () asy {}.
  • Otherwise, the text inside the parens should be parsed as expressions.

If we allow function expressions inside initializer lists, then this heuristic would have to get more complicated. We already have a lot of parser complexity in this area, so I think there's a relatively high risk of bugs. I would certainly want to think pretty carefully when designing the new heuristic, and I would want to do a lot of testing to make sure it gets all the corner cases right. It would be a nontrivial amount of effort. Personally, I don't think that effort would really buy us much in terms of user satisfaction.

I suppose you could argue that the alternative, where we prohibit generic function literals when mayParseFunctionExpressions is false, doesn't buy us much in terms of user satisfaction either. But it would be dead simple to implement and test, and it would still have the advantage of bringing the implementations more in line with the spec (which is a good thing).

@eernstg
Copy link
Member Author

eernstg commented Nov 21, 2024

OK, so we have the initial suggestion:

So we could update the spec to say that <initializerExpression> cannot derive a non-generic function expression.
This spec change would not give rise to any implementation effort because the tools already behave in this way.

And the new one you mentioned, @stereotype441:

Adjust the implementation such that the error is reported also for generic function literals; do not change the spec.

I'd be happy about the either of those, with a slight preference for the second.

@leafpetersen
Copy link
Member

Fine with either, slight preference for the latter (assuming no meaningful breakage).

@munificent
Copy link
Member

munificent commented Nov 26, 2024

I'm guessing this isn't a common coding pattern, so assuming that's true,

I checked. From a large corpus of pub packages and itsallwidgets.com (48,509,868 lines in 229,473 files):

Of all constructor initializers, none of their right-hand sides were function expressions:

-- Initializer (162824 total) --
 162824 (100.000%): Other  =====================================================

Looking at function expressions more generally:

-- Function expression (798844 total) --
 798609 ( 99.971%): Non-generic  ===============================================
    235 (  0.029%): Generic      =

So generic lambdas are exceedingly rare. Almost all of them are in a small number of packages:

-- Package (235 total) --
     63 ( 26.809%): dart_mappable-4.3.0          =========
     31 ( 13.191%): riverpod-2.6.0               =====
     23 (  9.787%): reflection_factory-2.2.5     ====
     23 (  9.787%): reflection_factory-2.4.2     ====
     14 (  5.957%): zycloud_widget-1.1.2         ==
     13 (  5.532%): shelf_easy-3.2.2             ==
      8 (  3.404%): _fe_analyzer_shared-76.0.0   ==
      6 (  2.553%): bones_api-1.5.10             =
      6 (  2.553%): bones_api-1.7.9              =
      6 (  2.553%): bones_api-1.7.24             =
...

If we allow function expressions inside initializer lists, then this heuristic would have to get more complicated. We already have a lot of parser complexity in this area, so I think there's a relatively high risk of bugs. I would certainly want to think pretty carefully when designing the new heuristic, and I would want to do a lot of testing to make sure it gets all the corner cases right. It would be a nontrivial amount of effort. Personally, I don't think that effort would really buy us much in terms of user satisfaction.

I recall the old language team stressing about exactly this parser complexity when they first added function expressions to the language and discovered the annoying corner case around constructor initializers. So, yeah, I'd definitely want to err on the side of keeping things grammatically simple.

I'd vote just not allowing function expressions in constructor initializers, generic or not.

@eernstg
Copy link
Member Author

eernstg commented Nov 27, 2024

I'd vote just not allowing function expressions in constructor initializers, generic or not.

I think this is the outcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants