Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support additional languages #47

Open
andrewdea opened this issue Oct 14, 2024 · 3 comments
Open

Support additional languages #47

andrewdea opened this issue Oct 14, 2024 · 3 comments

Comments

@andrewdea
Copy link
Contributor

andrewdea commented Oct 14, 2024

Related issue: #13

From my understanding, using tree-sitter is most likely the best way to go about this.

When possible, a specialized parser is probably preferable, so I don't think we'd remove the current RustPython parser. But tree-sitter will allow us to support most languages, with minimal specialized treatment of each.

Feedback welcome 🙂

@rohaquinlop
Copy link
Owner

I would prefer using a parser instead of tree-sitter but that's because I've found breaking changes when they update a repository language, and it is a bit painful to refactor the code to make it compatible with the new version. But that's just my opinion, I'm not sure if every language parser is available at crates.io.

Which languages are we going to prioritize to support them? I was thinking in the following languages:

  • C++
  • JavaScript
  • Rust

@andrewdea I would like to know which ones you have in mind.

@andrewdea
Copy link
Contributor Author

@rohaquinlop I think those are great candidates!

Can you say a bit more about tree-sitter's breaking changes? I've used tree-sitter's python bindings before and I did encounter some breaking changes, but the earlier versions are usually quite stable, so that it's easy to delay updates until they're absolutely needed and the code is ready.

Here's the approach I'm thinking:

  • a generalized layer using tree-sitter is the default
  • specific languages like Python (already implemented), and others we prioritize have their own specialized parser

Starting with tree-sitter would have a few advantages:

  • "out-of-the-box" intermediate layer: parsing is specific to each language, but the cognitive-complexity algorithm aims at being language-agnostic. So we need to be able to apply it to any parsed snippet. We can start by using tree-sitter's existing tried-and-tested approach, and we'll see which changes we need to make to fit our use-case. Otherwise adding support for any language would involve two implementations: the parsing, and the cognitive-complexity computation, which would then create a lot of complexity in terms of maintaining the algorithm and ensuring it is consistent across languages
  • stronger impact: once we find the right way to interact with tree-sitter, we will automatically have support for a lot of languages. Leveraging tree-sitter means complexipy will be useful across countless code-bases and use-case
  • easier dependency management: while the tree-sitter libraries might have a tendency for breaking changes, it still seems easier to deal with the single generalized tree-sitter bindings rather than managing many completely isolated language-specific parser-libraries for each language
  • we can (and probably should) still use a specialized parser whenever possible, for specific languages that are more important (and again I think the ones you mentioned are great starting candidates). A specialized parser will probably be better at catching edge-cases etc. But having the experience of adapting to tree-sitter's intermediate representation will help us deal with specialized parsers in a way that is easier to generalize.

Sorry this is a bit long, would love to know what you think 🙂

@rohaquinlop
Copy link
Owner

@andrewdea You're right, I like this approach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants