Skip to content

SwiftSyntax support for module selectors #3091

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

beccadax
Copy link
Contributor

@beccadax beccadax commented Jun 2, 2025

This PR adds module selectors to SwiftSyntax and SwiftParser (draft of matching compiler feature in swiftlang/swift#34556). This feature was pitched ages ago; a proper proposal is on my todo list now available.

Note: This PR is still a work in progress—in particular, I need to go back and improve the tests, both by expanding their coverage and by adding assertions about the resulting syntax trees—but I'd appreciate feedback on the design while I'm working on that.

Reviewers: If you review this commit-by-commit, I've separated out the big dumb mechanical changes from the ones that actually change parsing logic.

@nkcsgexi
Copy link
Contributor

nkcsgexi commented Jun 2, 2025

🥳

///
/// - Precondition: `node` must, at minimum, have a descendant with an unexpected nodes child; it therefore cannot be
/// a token or an empty collection.
func attach<Node: RawSyntaxNodeProtocol>(_ moduleSelector: RawModuleSelectorSyntax?, to node: Node) -> Node {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retroactively rewriting a node to insert new information is a pretty unconventional way for SwiftParser to do things, so I thought I should justify this design.

Because a module selector is a prefix on a different syntax and requires two tokens, it's difficult to peek past one and make decisions based on the tokens that follow it. Instead, I found it easier to parse them in relatively high-level productions; that gets the tokens out of the way, but it means they're often parsed a significant distance from the node that will become their parent. I tried two other approaches to coping with this problem:

  1. Adding new ModuleSelectorExprSyntax and ModuleSelectorTypeSyntax nodes. The issue was that I want to make sure we could parse invalid module selectors—even in declaration syntax, patterns, etc.—into unexpected nodes, and that didn't offer a good way to do so. (I also found that the module selectors on member lookups wouldn't be able to be handled in this way; I didn't really like that.)

  2. Passing the ModuleSelectorSyntax node down as a parameter and threading it through to wherever it was needed. This required me to add a lot of parameters and manually insert unexpected module selectors into a lot of nodes, but that's not actually what scuttled the idea—it's that it dramatically increased stack usage. At one point I had to reduce the maximum recursion in development builds to 10, which isn't even enough to handle the swift-syntax repo itself.

Retroactively attaching module selectors in this fashion allows more of the parser to ignore the fact that an invalid module selector might have been parsed earlier on, while also avoiding the stack usage problems I mentioned.

Copy link
Member

@rintaro rintaro Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why we want

I want to make sure we could parse invalid module selectors—even in declaration syntax, patterns, etc.

What makes module selectors different from other invalid things? E.g.:

let Foo.Bar = 12

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple reasons:

  1. The difference between a name and a name reference is a bit subtle and hasn't previously been grammatically important; I can imagine that developers might be confused about where they are allowed to use module selectors.

  2. If a developer writes Foo::Bar in a place where module selectors are not valid, we can be almost certain that they want to keep Bar (because we know that Foo was supposed to be a module name), but the naïve recovery behavior will treat Foo as the name and ::Bar as unexpected syntax. Tailoring the recovery gives us a tree that better reflects the user's likely intent. (Whereas with let Foo.Bar the developer might have wanted let Bar, or let Foo, or let Foo = .Bar, or any number of other possibilities; it's harder to be certain of their intent.)

Copy link
Member

@rintaro rintaro Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this. but I'm still not convinced. Let's say for let Foo::Bar = <expr>, I don't think we can say the user tried to specify the module name. The user might mistyped let Foo: Bar = <expr>.

Whether developers might get confused between name declarations and name references is subjective, so I won’t argue that. But I don’t think it’s worth adding this extra complexity to the implementation. It looks overly complex. Also, rewriting parsed nodes leaves abandoned nodes in the arena, which is not great for memory usage. IMO, the module selector parsing should be mostly contained in parseDeclReferenceExpr(), (can)parseTypeIdentifier(), and (can)parseSimpleType() (for member types). Of course we should do our best to emit better diagnostics. But I don't feel this is the way. Also we can improve diagnostics later. Can we make it simple for the initial implementation?

This capability is currently unused, but it’ll matter soon.
@beccadax beccadax force-pushed the mod-squad branch 3 times, most recently from 1f47aa2 to eaa6956 Compare July 9, 2025 22:04
beccadax added 5 commits July 9, 2025 15:24
And make sure it doesn’t break Objective-C selector lexing.
Treating introducers as keywords is now always conditional on whether `shouldParsePatternBinding(introducer:)` returns `true`. That method has also been modified to correctly handle an edge case with wildcard patterns.
Initializers for nodes with experimental node children need to be marked `@_spi`. This PR:

• Adds that attribute.
• Generates an alternative which *doesn’t* use SPI as part of the compatibility layer.
• As a side effect, adds a `Child.Refactoring.introduced` case that can be used to generate compatibility `unexpected` properties.

No functional change in this commit, but it will affect the code generation in the next one.
@beccadax beccadax force-pushed the mod-squad branch 2 times, most recently from da6eff6 to f94b58a Compare July 10, 2025 03:00
@beccadax
Copy link
Contributor Author

@swift-ci please test

@beccadax
Copy link
Contributor Author

@swift-ci please test

1 similar comment
@beccadax
Copy link
Contributor Author

@swift-ci please test

@beccadax beccadax marked this pull request as ready for review July 11, 2025 20:50
beccadax added 7 commits July 11, 2025 16:26
Changes the syntax tree to represent module selectors:

• A `ModuleSelectorSyntax` node represents a module selector abstractly.
• The following nodes now have an optional `moduleSelector` child:
    • `DeclReferenceExprSyntax`
    • `IdentifierTypeSyntax`
    • `MacroExpansionExprSyntax`
    • `MemberTypeSyntax`
• BasicFormat knows the preferred format for module selectors.

Other components, particularly the parser, were also updated to continue building, though without any changes in behavior. Parser implementation will come in a future commit.
Changes it to share code with `parseTypeIdentifier()` and clean up the member type parsing a little. Also tweaks call sites of `parseTypeIdentifier()`.
This commit ports over tests from the compiler’s (future) `test/NameLookup/module_selector.swift` file and makes sure the correct uses parse as expected. It also tests that ill-formed module selectors (ones with a missing or non-identifier module name) are diagnosed correctly.

This commit doesn’t fully handle recovery from module selectors inserted at invalid locations; the test cases that require recovery are XFAILed.
Specifically, from module selectors at incorrect locations. This is done through a couple of mechanisms:

• The various `expect(…)` methods consume a module selector as unexpected syntax.
• Various identifier-parsing productions now pre-parse an invalid module selector and convert it to unexpected syntax. In some cases this involves adjusting matching `can`/`at` methods to consume otherwise-invalid module selectors.
• The previously-introduced `attach(_:to:)` mechanism is now used in more places.

This makes all test cases inherited from the Swift tests pass, except for the `import` syntax which I’m a little iffy on.
Since some types and expressions can have module selectors, MissingTypeSyntax and MissingExprSyntax should have a module selector child and `attach(_:to:)` should be able to attach a module selector to them. This keeps the parser from erroring on both the module selector *and* the missing node.
Add tailored diagnostics for unexpected module selectors which offer either one or two fix-its:

• Remove the module selector
• Convert `Foo::bar` to `bar = Foo::bar` (in certain declaration syntaxes)

Making these messages clear required adding `nameForDiagnostics` properties to a bunch of children, which also impacted other existing diagnostics (for the better, IMHO). If we don’t like those changes or think they need more work, this commit can be severed from the rest of the PR.
beccadax added 2 commits July 11, 2025 16:26
When `parseFunctionParameter()` parses two argument labels and then doesn’t find a type, it applies a heuristic to decide whether to reinterpret the second label as a type (and recover by inserting a colon between the labels). The code in this path could drop unexpected nodes between the two labels. Correct this issue and, if the unexpected syntax includes a module selector, reconstruct it and attach it to the type.

This was probably a pre-existing bug, but the module selector tests managed to hit it through mutation testing.
The code assumed that dropping a `MissingTypeSyntax` wouldn’t lose any tokens.
@beccadax
Copy link
Contributor Author

@swift-ci please test

@nkcsgexi
Copy link
Contributor

@swift-ci please test macOS

@nkcsgexi
Copy link
Contributor

@swift-ci please test Linux

@nkcsgexi
Copy link
Contributor

This build failure seems to be real:

12:29:57  /Users/ec2-user/jenkins/workspace/swift-syntax-PR-macOS/branch-main/swift-syntax/Sources/SwiftSyntax/generated/SyntaxTraits.swift:209:7: error: protocol requirement 'moduleSelector' cannot be declared '@_spi' without a default implementation in a protocol extension
12:29:57  207 | 
12:29:57  208 |   @_spi(ExperimentalLanguageFeatures)
12:29:57  209 |   var moduleSelector: ModuleSelectorSyntax? {
12:29:57      |       `- error: protocol requirement 'moduleSelector' cannot be declared '@_spi' without a default implementation in a protocol extension
12:29:57  210 |     get
12:29:57  211 |     set

@beccadax
Copy link
Contributor Author

I can't actually reproduce that failure locally, but I'm pushing a speculative fix.

@beccadax
Copy link
Contributor Author

@swift-ci please test

@beccadax
Copy link
Contributor Author

@swift-ci please test macOS

@beccadax
Copy link
Contributor Author

@swift-ci please test Linux

@beccadax
Copy link
Contributor Author

@swift-ci please test Windows

@nkcsgexi
Copy link
Contributor

We are good now with macOS and Linux. Let's try Windows again. @swift-ci please test Windows.

@beccadax beccadax requested a review from rintaro July 15, 2025 18:21
@nkcsgexi
Copy link
Contributor

hmm, the Windows CI hit a compiler crasher:

[519/528] Compiling SwiftDiagnostics Convenience.swift
error: compile command failed due to exception 3 (use -v to see invocation)
SIL memory lifetime failure in @$s11SwiftParser13TokenConsumerPAAE27consumeModuleSelectorTokens0C0Qz22moduleNameOrUnexpected_AF010colonColonC0SayAFG5extratSgyF: memory is initialized, but shouldn't be

Copy link
Member

@rintaro rintaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies I didn't review this sooner. I still haven't look into the implementation closely, but here's the first round 🙇

documentation:
"A module selector. Some expressions can be prefixed with module selectors, so if one is parsed before an invalid expression, it will be inserted here.",
isOptional: true
),
Copy link
Member

@rintaro rintaro Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel using MissingExprSyntax as dangling module selector is the way to go. MissingExprSyntax is a "placeholder" for expression with unknown kind. But IMO, we can reasonably assume the user is to add an identifier after a module selector to form DeclReferenceExprSyntax.
I feel it's more natural to model it as DeclReferenceExprSyntax with missing baseName.

Same for MissingTypeSyntax

///
/// - Precondition: `node` must, at minimum, have a descendant with an unexpected nodes child; it therefore cannot be
/// a token or an empty collection.
func attach<Node: RawSyntaxNodeProtocol>(_ moduleSelector: RawModuleSelectorSyntax?, to node: Node) -> Node {
Copy link
Member

@rintaro rintaro Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this. but I'm still not convinced. Let's say for let Foo::Bar = <expr>, I don't think we can say the user tried to specify the module name. The user might mistyped let Foo: Bar = <expr>.

Whether developers might get confused between name declarations and name references is subjective, so I won’t argue that. But I don’t think it’s worth adding this extra complexity to the implementation. It looks overly complex. Also, rewriting parsed nodes leaves abandoned nodes in the arena, which is not great for memory usage. IMO, the module selector parsing should be mostly contained in parseDeclReferenceExpr(), (can)parseTypeIdentifier(), and (can)parseSimpleType() (for member types). Of course we should do our best to emit better diagnostics. But I don't feel this is the way. Also we can improve diagnostics later. Can we make it simple for the initial implementation?

Comment on lines +116 to +119
// Technically the current token *should* be an identifier, but we also want to diagnose other tokens that might be
// used by accident (particularly keywords and `_`). However, we don't want to consume tokens which would make the
// surrounding structure mis-parse.
return self.at(anyIn: StructuralTokens.self) == nil
Copy link
Member

@rintaro rintaro Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to (almost) always consider <token> :: as a module qualifier. E..g.

class
::

This looks to me an incomplete class declaration and just an orphan ::. Even if it's on the same line with no-space, I don't think we need to parse it as a module selector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants