Support additional languages #47

andrewdea · 2024-10-14T21:39:42Z

Related issue: #13

From my understanding, using tree-sitter is most likely the best way to go about this.

When possible, a specialized parser is probably preferable, so I don't think we'd remove the current RustPython parser. But tree-sitter will allow us to support most languages, with minimal specialized treatment of each.

Feedback welcome 🙂

rohaquinlop · 2024-10-15T14:52:46Z

I would prefer using a parser instead of tree-sitter but that's because I've found breaking changes when they update a repository language, and it is a bit painful to refactor the code to make it compatible with the new version. But that's just my opinion, I'm not sure if every language parser is available at crates.io.

Which languages are we going to prioritize to support them? I was thinking in the following languages:

C++
JavaScript
Rust

@andrewdea I would like to know which ones you have in mind.

andrewdea · 2024-10-15T16:54:04Z

@rohaquinlop I think those are great candidates!

Can you say a bit more about tree-sitter's breaking changes? I've used tree-sitter's python bindings before and I did encounter some breaking changes, but the earlier versions are usually quite stable, so that it's easy to delay updates until they're absolutely needed and the code is ready.

Here's the approach I'm thinking:

a generalized layer using tree-sitter is the default
specific languages like Python (already implemented), and others we prioritize have their own specialized parser

Starting with tree-sitter would have a few advantages:

"out-of-the-box" intermediate layer: parsing is specific to each language, but the cognitive-complexity algorithm aims at being language-agnostic. So we need to be able to apply it to any parsed snippet. We can start by using tree-sitter's existing tried-and-tested approach, and we'll see which changes we need to make to fit our use-case. Otherwise adding support for any language would involve two implementations: the parsing, and the cognitive-complexity computation, which would then create a lot of complexity in terms of maintaining the algorithm and ensuring it is consistent across languages
stronger impact: once we find the right way to interact with tree-sitter, we will automatically have support for a lot of languages. Leveraging tree-sitter means complexipy will be useful across countless code-bases and use-case
easier dependency management: while the tree-sitter libraries might have a tendency for breaking changes, it still seems easier to deal with the single generalized tree-sitter bindings rather than managing many completely isolated language-specific parser-libraries for each language
we can (and probably should) still use a specialized parser whenever possible, for specific languages that are more important (and again I think the ones you mentioned are great starting candidates). A specialized parser will probably be better at catching edge-cases etc. But having the experience of adapting to tree-sitter's intermediate representation will help us deal with specialized parsers in a way that is easier to generalize.

Sorry this is a bit long, would love to know what you think 🙂

rohaquinlop · 2024-10-23T22:29:39Z

@andrewdea You're right, I like this approach

andrewdea mentioned this issue Oct 14, 2024

Expose library commands #45

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support additional languages #47

Support additional languages #47

andrewdea commented Oct 14, 2024 •

edited

Loading

rohaquinlop commented Oct 15, 2024

andrewdea commented Oct 15, 2024

rohaquinlop commented Oct 23, 2024

Support additional languages #47

Support additional languages #47

Comments

andrewdea commented Oct 14, 2024 • edited Loading

rohaquinlop commented Oct 15, 2024

andrewdea commented Oct 15, 2024

rohaquinlop commented Oct 23, 2024

andrewdea commented Oct 14, 2024 •

edited

Loading