Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure github-linguist compatibility for the syntaxes #1659

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

huwaireb
Copy link

@huwaireb huwaireb commented Nov 25, 2024

blocking github-linguist/linguist#7126

p.s ensured formatting using biome.

@huwaireb huwaireb changed the title Ensure github-linguist compatibility Ensure github-linguist compatibility for the syntaxes Nov 25, 2024
@smorimoto smorimoto requested a review from mnxn November 25, 2024 18:27
@smorimoto
Copy link
Collaborator

Could you take a look at this? @mnxn

@huwaireb
Copy link
Author

huwaireb commented Nov 25, 2024

@mnxn, there is a slight (non-blocking) issue with the linguist PR, I'd love to hear what you think on.

github-linguist/linguist#7126 (comment)

could we have a dune-all grammar that includes all stanzas (dune{-project,-workspace}) & base? Given the issue mentioned above.

Edit: See this comment to how we can do it, it'll be zero maintenance.
Edit 2: pushed it up

@@ -279,7 +279,7 @@
},
{
"comment": "destructured semantic value capture",
"begin": "(?<![[:word:]][[:space:]]*)\\(",
"begin": "(?<!\\w)\\(",
Copy link
Author

@huwaireb huwaireb Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is a correct fix.

It resolves

- Invalid regex in grammar: `source.ocaml.menhir` (in `syntaxes/menhir.json`) 
contains a malformed regex (regex "`(?<![[:word:]][[:space:]]*)\(`": 
lookbehind assertion is not fixed length (at offset 26))

caused by [[:space:]]*.

Copy link
Collaborator

@mnxn mnxn Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the regex would not match either a(b, c) = x or a (b, c) = x. But now the second string matches.
I don't know what the ideal solution is here, but at the very least, both of the strings above should not match.

Also, the regex should stick with [[:word:]] instead of \\w.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it accepts this (?<![[:word:]])(?<![[:space:]])\\(, can we do this instead? Rubular parses it fine, i can't seem to get it to make the vsix to try it out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @mnxn, sorry for the ping. but is there any way we can sort this out? This is the only remaining blocker afaik

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like that regex works. I think it might be fundamentally impossible to achieve the same behavior while staying fixed length.

I'm okay with relaxing the constraint a little bit and recognizing only 0 or 1 spaces like the examples I posted above. This regex should accomplish that: (?<![[:word:]]|[[:word:]][[:space:]])\(

BTW: you can try the extension by opening the repo in VS Code and pressing the run button. No need to make a .vsix or even build the JS if you're just testing the syntaxes.

image

syntaxes/dune-all.json Outdated Show resolved Hide resolved
@@ -279,7 +279,7 @@
},
{
"comment": "destructured semantic value capture",
"begin": "(?<![[:word:]][[:space:]]*)\\(",
"begin": "(?<!\\w)\\(",
Copy link
Collaborator

@mnxn mnxn Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the regex would not match either a(b, c) = x or a (b, c) = x. But now the second string matches.
I don't know what the ideal solution is here, but at the very least, both of the strings above should not match.

Also, the regex should stick with [[:word:]] instead of \\w.

@smorimoto smorimoto marked this pull request as draft December 7, 2024 20:26
@smorimoto smorimoto requested a review from Copilot December 28, 2024 17:50

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

Files not reviewed (9)
  • package.json: Language not supported
  • syntaxes/atd.json: Language not supported
  • syntaxes/cram.json: Language not supported
  • syntaxes/dune-all.json: Language not supported
  • syntaxes/dune.json: Language not supported
  • syntaxes/menhir.json: Language not supported
  • syntaxes/ocaml.json: Language not supported
  • syntaxes/ocamlbuild.json: Language not supported
  • syntaxes/ocamllex.json: Language not supported
@huwaireb
Copy link
Author

bit of a problem now that i figured to test it ;c. It takes priority depending on order, and whilst some stanzas do work, the first included stanza e.g dune would be the only one that is highlighted correctly. The rest only partially.

I probably could differentiate dune-project and dune by looking for (lang ...) at the top of the file, not sure about dune-workspace however. And that'd be a less painful way to address it, although not the best.

cc @mnxn

@mnxn
Copy link
Collaborator

mnxn commented Jan 21, 2025

bit of a problem now that i figured to test it ;c. It takes priority depending on order, and whilst some stanzas do work, the first included stanza e.g dune would be the only one that is highlighted correctly. The rest only partially.

I probably could differentiate dune-project and dune by looking for (lang ...) at the top of the file, not sure about dune-workspace however. And that'd be a less painful way to address it, although not the best.

cc @mnxn

Is there no way for linguist to use the filenames to determine which grammar is used?

@huwaireb
Copy link
Author

Is there no way for linguist to use the filenames to determine which grammar is used?

that's what I tried initially, unfortunately each language can only use one grammar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants