Skip to content

Parsing switch expression with when clause with closure #51482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
scheglov opened this issue Feb 21, 2023 · 11 comments
Closed

Parsing switch expression with when clause with closure #51482

scheglov opened this issue Feb 21, 2023 · 11 comments
Assignees
Labels
legacy-area-front-end Legacy: Use area-dart-model instead. P2 A bug or feature request we're likely to work on

Comments

@scheglov
Copy link
Contributor

scheglov commented Feb 21, 2023

This is excerpt from language/patterns/guard_error_test.

Object f() {
  return switch (false) {
    var x when (){ return true; }() => x,
    _ => false,
  };
}

This reports

  Found but did not expect:
    ParserErrorCode.EXPECTED_TOKEN [56, 1, "Expected to find '=>'."]
    ParserErrorCode.UNEXPECTED_TOKEN [58, 6, "Unexpected text 'return'.", "Try removing the text."]
    ParserErrorCode.EXPECTED_TOKEN [69, 1, "Expected to find '}'."]
    ParserErrorCode.EXPECTED_TOKEN [75, 2, "Expected to find '}'."]
    CompileTimeErrorCode.NON_BOOL_CONDITION [54, 2, "Conditions must have a static type of 'bool'.", "Try changing the condition."]
    CompileTimeErrorCode.INVOCATION_OF_NON_FUNCTION_EXPRESSION [56, 16, "The expression doesn't evaluate to a function, so it can't be invoked."]
    CompileTimeErrorCode.NON_EXHAUSTIVE_SWITCH [22, 6, "The type 'bool' is not exhaustively matched by the switch cases.", "Try adding a default case or cases that match true."]
    HintCode.UNUSED_LOCAL_VARIABLE [47, 1, "The value of the local variable 'x' isn't used.", "Try removing the variable or using it."]
@scheglov scheglov added P1 A high priority bug; for example, a single project is unusable or has many test failures legacy-area-front-end Legacy: Use area-dart-model instead. labels Feb 21, 2023
@stereotype441
Copy link
Member

This is actually quite tricky, due to how we're handling a related grammar ambiguity.

As explained in the spec, constructs like this are ambiguous:

var x = switch (obj) {
  _ when a + (b) => (c) => body
};

(because either => could be interpreted as separating the guardedPattern from its corresponding expression, and the other => separates function literal parameters from a function literal body).

The spec says that "When => is encountered after when in a guard, if the code between forms a valid expression, then it is interpreted as such and the => is treated as the separator between the guard and case body."

But this is not precisely what I implemented (oops). The rule I actually implemented was: a function expression cannot appear in a guard, unless it is inside grouping operators (parens, brackets, or curly braces). This means that a parenthesized group appearing in a guard is always parsed as a parenthesized expression or a record literal, and never as function literal arguments.

This is my fault--I didn't realize at the time that there were user-visible differences between what I was doing and what was specified. Looking at it now, I think there are two user-visible differences:

  1. According to the spec, it should be possible to put a block-bodied function literal in a guard clause (because it's not ambiguous). But this doesn't work, and the error messages the user gets from the parser aren't very helpful, because the parser tries to parse the parens as a parenthesized expression or record literal, and then it has no idea what to do with the { that follows. That's what's happening in @scheglov's example.

  2. According to the spec, it should be possible to put an expression-bodied function literal in a guard clause, provided that the text before the first => does not form a valid expression. For example, the parser should accept the switch expression switch (...) { var x when y + (int z) => true => x, ...}. (And indeed, for an appropriate definition of y, this could even have well-defined runtime semantics). But this doesn't work, and again the error messages the user gets from the parser aren't very helpful, because the parser tries to intrepret the contents of the parentheses as an expression.

Unfortunately, implementing precisely what was specified would be very challenging, because the parser is event-based, and so it is not easy for it to determine whether a sequence of tokens forms a valid expression without committing to parsing that sequence of tokens. (It's technically possible using the NullListener, but it's tricky to get right and carries performance penalties).

I'm considering a few alternatives:

  1. Change the spec to match what I implemented, and add some error recovery logic to the parser to try to give a more useful error message for @scheglov's example.

  2. Change the spec to say that only expression-bodied function literals are prohibited in guard clauses. This would allow @scheglov's example to be accepted. Unfortunately, it's non-trivial to implement, because in some circumstances the parser commits to parsing a function literal before it's seen the { or => token.

Before I do any coding, I'll gather opinions from some folks about what to do.

@munificent
Copy link
Member

So the spec says:

When => is encountered after when in a guard, if the code between forms a valid expression, then it is interpreted as such and the => is treated as the separator between the guard and case body.

I'm actually a little surprised I wrote it that way because "forms a valid expression" feels very brittle and annoying to parse to me. I recall avoiding a similar formulation to resolve another ambiguity. I definitely wouldn't want a slight tweak to a guard that causes the stuff before => to step outside of the overlap between parameter list and expression to cause the entire thing to be reinterpreted:

// OK, could be an expression but also happens to be a parameter list:
case _ when (a, b, c, {e, f, g}) => true => body;

// Oops, now it must be a parameter list:
case _ when (a, b, c, {e, int f, g}) => true => body;

The rule I actually implemented was: a function expression cannot appear in a guard, unless it is inside grouping operators (parens, brackets, or curly braces).

I like that. My understanding of what you propose here is that there's basically two copies of the entire expression grammar. One allows function expressions and the other does not. All of the places in the grammar delimited by braces take you to the expression grammar that allows function expressions. The grammar rule for guards (and constructor initializers) take you to the one that doesn't. But within that function-less grammar, the rules for parenthesized expressions, argument lists, collection literals, etc. all hop you over back to the grammar where function expressions are allowed.

That would mean these are all OK (syntactically):

case _ when (() => true) => body; // Parenthesized.
case _ when [() => true] => body; // List literal.
case _ when {() => true} => body; // Set literal.
case _ when {k: () => true} => body; // Map literal.
case _ when foo(() => true) => body; // Argument list (including `assert`, `super`, method calls).
case _ when [() => true] => body; // List literal.
case _ when list[() => true] => body; // Index operator.

But these are parse errors:

case _ when () => true => body; // Bare.
case _ when !() => true => body; // Prefix operand.
case _ when a + () => true => body; // Infix operand.
case _ when a = () => true => body; // Assignment value.
case _ when c ? a : () => true => body; // Conditional operand.
case _ when c ? () => true : a => body; // Conditional operand.

Do I have that right?

If so, that sounds reasonable to me, though I worry about the complexity and how we specify it. ECMAScript has a whole thing around grammatical parameters and it would be nice to not have to go there.

Unfortunately, implementing precisely what was specified would be very challenging, because the parser is event-based, and so it is not easy for it to determine whether a sequence of tokens forms a valid expression without committing to parsing that sequence of tokens.

We already know it's going to be parsing an expression. We don't know whether that expression will turn out to be a function expression or something else, but the thing after when is definitely an expression. It's just a question of what kind or expression and whether the => is part of it.

We already resolve that parsing difficulty somehow, because you run into it basically everywhere:

var x = (a, b, c, d, e, f …

When the parser is at (, it doesn't know if that's going to ultimately be a function expression and those identifiers are parameters, or a record literal and those identifiers are expressions. (Even before record literals, it's possible to conjure up an arbitrarily long sequence of tokens that is both a valid parameter list and expression.)

Another approach I could see us taking is that we parse the guard expression with maximal munch: If it's possible for the => to be consumed and used as part of a function expression, then do so. If that's not what you want, then parenthesize it yourself to avoid it.

I think that's more or less how we handle function expression bodies when they have infix or postfix operators following them: the body eats them all so () => a + b is parsed as () => (a + b) and not (() => a) + b. Would that work here? How much churn would that cause?

@lrhn
Copy link
Member

lrhn commented Mar 23, 2023

I like the "only inside delimiters" rule too, like we do in initializer list expressions too. (So it's not a new idea, that's a good thing).

@stereotype441
Copy link
Member

So the spec says:

When => is encountered after when in a guard, if the code between forms a valid expression, then it is interpreted as such and the => is treated as the separator between the guard and case body.

I'm actually a little surprised I wrote it that way because "forms a valid expression" feels very brittle and annoying to parse to me. I recall avoiding a similar formulation to resolve another ambiguity. I definitely wouldn't want a slight tweak to a guard that causes the stuff before => to step outside of the overlap between parameter list and expression to cause the entire thing to be reinterpreted:

// OK, could be an expression but also happens to be a parameter list:
case _ when (a, b, c, {e, f, g}) => true => body;

// Oops, now it must be a parameter list:
case _ when (a, b, c, {e, int f, g}) => true => body;

The rule I actually implemented was: a function expression cannot appear in a guard, unless it is inside grouping operators (parens, brackets, or curly braces).

I like that. My understanding of what you propose here is that there's basically two copies of the entire expression grammar. One allows function expressions and the other does not. All of the places in the grammar delimited by braces take you to the expression grammar that allows function expressions. The grammar rule for guards (and constructor initializers) take you to the one that doesn't. But within that function-less grammar, the rules for parenthesized expressions, argument lists, collection literals, etc. all hop you over back to the grammar where function expressions are allowed.

Yes, that's a good description of what I implemented. Incidentally, as @lrhn mentioned above, I didn't invent this rule from scratch--I repurposed what the parser was already doing for constructor initializers. Curiously, I don't see anything in the spec prohibiting undelimited function expressions inside constructor initializers. I guess this is a piece of technical debt for the spec (@eernstg I'd be curious what the spec parser does 😃).

That would mean these are all OK (syntactically):

case _ when (() => true) => body; // Parenthesized.
case _ when [() => true] => body; // List literal.
case _ when {() => true} => body; // Set literal.
case _ when {k: () => true} => body; // Map literal.
case _ when foo(() => true) => body; // Argument list (including `assert`, `super`, method calls).
case _ when [() => true] => body; // List literal.
case _ when list[() => true] => body; // Index operator.

But these are parse errors:

case _ when () => true => body; // Bare.
case _ when !() => true => body; // Prefix operand.
case _ when a + () => true => body; // Infix operand.
case _ when a = () => true => body; // Assignment value.
case _ when c ? a : () => true => body; // Conditional operand.
case _ when c ? () => true : a => body; // Conditional operand.

Do I have that right?

Yes, I belive so.

If so, that sounds reasonable to me, though I worry about the complexity and how we specify it. ECMAScript has a whole thing around grammatical parameters and it would be nice to not have to go there.

If you're worried about implementation complexity, you don't need to, because it's already implemented, and it's fairly simple (the parser just has a boolean that keeps track of whether we're in a context where function literals are allowed). If you're worried about specification complexity, then yeah, I see where you're coming from. But that horse has already fled the stables because we're already doing this for constructor initializers 😃.

Unfortunately, implementing precisely what was specified would be very challenging, because the parser is event-based, and so it is not easy for it to determine whether a sequence of tokens forms a valid expression without committing to parsing that sequence of tokens.

We already know it's going to be parsing an expression. We don't know whether that expression will turn out to be a function expression or something else, but the thing after when is definitely an expression. It's just a question of what kind or expression and whether the => is part of it.

We already resolve that parsing difficulty somehow, because you run into it basically everywhere:

var x = (a, b, c, d, e, f …

When the parser is at (, it doesn't know if that's going to ultimately be a function expression and those identifiers are parameters, or a record literal and those identifiers are expressions. (Even before record literals, it's possible to conjure up an arbitrarily long sequence of tokens that is both a valid parameter list and expression.)

Right. And the way the parser currently disambiguates that is to look at the token after the matching ). If it's =>, {, async, or sync, then parse the thing between parens as function literal arguments; otherwise parse it as expressions. That works everywhere except in constructor initializers and pattern guards, hence the approach I took.

Another approach I could see us taking is that we parse the guard expression with maximal munch: If it's possible for the => to be consumed and used as part of a function expression, then do so. If that's not what you want, then parenthesize it yourself to avoid it.

I think that's more or less how we handle function expression bodies when they have infix or postfix operators following them: the body eats them all so () => a + b is parsed as () => (a + b) and not (() => a) + b. Would that work here? How much churn would that cause?

Unfortunately I don't think that works, because we need this unambiguous case to parse correctly: switch (...) { pattern when (...) => value, ... }. With the "maximal munch" rule, after consuming the when token, we would see the (, skip to the matching ), see that the next token is =>, and incorrectly conclude that the thing between parens is a set of function literal arguments. In this scenario, there's no way to get the parser to do the right thing by adding parens.

@lrhn
Copy link
Member

lrhn commented Mar 24, 2023

I don't see anything in the spec prohibiting undelimited function expressions inside constructor initializers

We know (even if we occasionally forget): #11509


I think we could make our spec-work easier if we introduce parameterized grammars, say something like:

<list>[element]{allowTrailingComma} ::= 
      element (',' element)* <trailingCommaOpt>{allowTrailingComma}
    | <empty>
    ;
<trailingCommaOpt>{allowTrailingComma} ::= 
      ',' -- if allowTrailingComma
    | <empty>
    ;
<showClause> ::= 'show' <list>[<identifier>]{allowTrailingComma = false} ;
<hideClause> ::= 'hide' <list>[<identifier>]{allowTrailingComma = false} ;
<argumentList> ::= <list>[(<identifier> ':')? <expression>]{allowTrailingComma = true}

where the [...] are production arguments and {...} are value arguments (usually Boolean flags).

Then we wouldn't have to have both expression and expressionNoCascade. It'd just be<expression>{allowCascade} ::= ....
That allows reuse, which avoids accidental drift.

And the we could have <expression>{allowFunctionExpression} ::= ... which would pass down the flag along the productions that can end up in a function expression, until | <functionExpression> -- if allowFunctionExpression.
I'm thinking that's effectively what you have implemented anyway.

stereotype441 added a commit to stereotype441/language that referenced this issue Mar 24, 2023
We now unconditionally prohibit function literals inside guards,
regardless of whether their bodies take the form `=> expression` or a
block.  This matches what was implemented in the parser.

See discussion in dart-lang/sdk#51482.
stereotype441 added a commit to stereotype441/language that referenced this issue Mar 24, 2023
We now unconditionally prohibit function literals inside guards,
regardless of whether their bodies take the form `=> expression` or a
block.  This matches what was implemented in the parser.

See discussion in dart-lang/sdk#51482.
@munificent
Copy link
Member

If you're worried about implementation complexity, you don't need to, because it's already implemented, and it's fairly simple (the parser just has a boolean that keeps track of whether we're in a context where function literals are allowed).

I'm not particularly worried about implementation complexity because it's pretty easy to make a parametric recursive descent parser by just, well, passing around parameters.

If you're worried about specification complexity, then yeah, I see where you're coming from. But that horse has already fled the stables because we're already doing this for constructor initializers 😃.

This one. :)

Parametric grammars are sorely tempting but I can't help but feel like it leads to more syntactic complexity than we want. But in this case, we're already doing it, so I guess it's not harmful. Keeping guards consistent with constructor initializers sounds good to me.

Any thoughts on how I should specify this in the proposal? My understanding is that the spec doesn't actually specify constructor initializers correctly right now.

@stereotype441
Copy link
Member

Any thoughts on how I should specify this in the proposal? My understanding is that the spec doesn't actually specify constructor initializers correctly right now.

I've sent a PR: dart-lang/language#2947

stereotype441 added a commit to dart-lang/language that referenced this issue Mar 25, 2023
…2947)

We now unconditionally prohibit function literals inside guards,
regardless of whether their bodies take the form `=> expression` or a
block.  This matches what was implemented in the parser.

See discussion in dart-lang/sdk#51482.
@stereotype441
Copy link
Member

As of dart-lang/language@46411cb, the specification has been updated to match what was implemented (function literals are now prohibited in guards unless enclosed in parentheses, square brackets, or curly braces).

So now the behaviour of the implementation is technically spec compliant. I'm keeping this issue open because I would still like to improve the error reporting if I can, but I'm reducing the priority to P2 since it's no longer a correctness issue.

@stereotype441 stereotype441 added P2 A bug or feature request we're likely to work on and removed P1 A high priority bug; for example, a single project is unusable or has many test failures labels Mar 25, 2023
@eernstg
Copy link
Member

eernstg commented Mar 26, 2023

@stereotype441 wrote:

I'd be curious what the spec parser does

The spec parser does not include the => function literal as a <primary>, it is included directly as an alternative in <expression>.

expression
    :    patternAssignment
    |    functionExpression // <----- This one derives `=>` function literals.
    |    throwExpression
    |    assignableExpression assignmentOperator expression
    |    conditionalExpression
    |    cascade
    ;

primary
    :    thisExpression
    |    SUPER unconditionalAssignableSelector
    |    SUPER argumentPart
    |    functionPrimary  // <----- This one derives block function literals.
    |    literal
    |    identifier
    |    newExpression
    |    constObjectExpression
    |    constructorInvocation
    |    '(' expression ')'
    |    constructorTearoff
    |    switchExpression
    ;

The fact that => function literals when included in <primary> are wildly ambiguous causes me to recommend that we make this approach the official one, every time this comes up. Here we go again: Let's do that! ;-)

Here is the snippet of the spec parser's grammar that deals with initializer lists:

initializerListEntry
    :    SUPER arguments
    |    SUPER '.' identifierOrNew arguments
    |    fieldInitializer
    |    assertion
    ;

fieldInitializer
    :    (THIS '.')? identifier '=' initializerExpression
    ;

initializerExpression
    :    conditionalExpression
    |    cascade
    ;

(Actually, looking at this, I believe initializerExpression should include throw and assignment expressions; but I've never seen any failures because they aren't currently included).

Interestingly, I haven't seen any failures based on the fact that I've used the patterns feature spec grammar rule exactly as stated:

guardedPattern
    :    pattern (WHEN expression)?
    ;

The reason for this is probably that ANTLR disambiguates the grammar using some techniques that we don't use in the Dart parser. Hence, I changed expression to initializerExpression in the guardedPattern rule above. The resulting treatment of the examples is as follows:

int get body => 2;

void main() {
  var _ = switch (0) {
    1 when (() => true) => body, // Parenthesized.
    1 when [() => true] => body, // List literal.
    1 when {() => true} => body, // Set literal.
    1 when {k: () => true} => body, // Map literal.
    1 when foo(() => true) => body, // Argument list.
    1 when [() => true] => body, // List literal.
    1 when list[() => true] => body, // Index operator.

    // Agree on most error cases mentioned earlier in this issue:
    // Error: 1 when () => true => body, // Bare.
    // Error: 1 when !() => true => body, // Prefix operand.
    // Error: 1 when a + () => true => body, // Infix operand.
    // Error: 1 when a = () => true => body, // Assignment value.
    // Error: 1 when c ? a : () => true => body, // Conditional operand.
    1 when c ? () => true : a => body, // Conditional operand.

    _ => body,
  };
}

@lrhn
Copy link
Member

lrhn commented Mar 26, 2023

(Actually, looking at this, I believe initializerExpression should include throw and assignment expressions; but I've never seen any failures because they aren't currently included).

Not a big loss. We're talking an unconditional throw or assignment here. I guess if you wanted to remember the most recent argument in a static variable, you can do this.foo = _fooCache = foo, but ... why?

@stereotype441
Copy link
Member

Thanks to dart-lang/language@46411cb I believe this issue is now resolved; the code from the initial bug report is now disallowed by the spec, so it's expected that it won't parse successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
legacy-area-front-end Legacy: Use area-dart-model instead. P2 A bug or feature request we're likely to work on
Projects
None yet
Development

No branches or pull requests

6 participants