Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace tree-sitter bindings with tree-house #12972

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

the-mikedavis
Copy link
Member

This PR drops our dependency on the upstream tree-sitter Rust bindings and moves us to a new highlighter crate we've been working on: tree-house.

The main benefit of the new highlighter is that it fixes correctness issues, especially since the reverse query precedence PR. It solves longstanding issues #1151 and #6491, and updates us to the latest tree-sitter C source (v0.22 -> v0.25), which will become important as more grammars adopt the new ABI. It's also a slight performance boost in my benchmarking (5-10%). We've hoped to separate out our highlighting machinery for a while so that other applications could take advantage and the hope is that tree-house is the right way to expose it (\cc @hadronized).

The new highlighter crate makes some abstractions clearer. The types to care about are already established in the syntax module:

  • Loader, an ECS-style holder of all languages (pub struct Language(pub u32)). This stores the language configuration as well as any grammar and query information and allows looking up a language by file-type, language name, shebang, etc.. Instead of being stored on Syntax this is now stored separately on Document and passed to Syntax for updates. In the long run we can eventually store a single ArcSwap<syntax::Loader> on Editor instead of many clones of Arc<ArcSwap<syntax::Loader>>.
  • Syntax, a wrapper of the Syntax type from tree-house. It holds a tree of trees of the injection layers of a document and provides logarithmic lookups to determine an injection layer (including the layer's language, config, tree and queries) for a given byte range. It also has a query iterator that works across injections which can support indents, textobjects, etc. in the future.

There are some differences compared to master that are less significant:

  • The configuration types have been moved to another module, helix_core::syntax::config. This is not strictly necessary for this change but it seems like a good opportunity to move these largely unrelated types away from the syntax module.
  • The new bindings deal in u32s instead of usize. This causes some changes in helix-core and the commands modules but is otherwise an implementation detail.
  • The use of tree_sitter::Point in the indents module has been replaced with byte indices. Generally we can avoid the point types and functions. syntax::generate_edits for example now produces zero points.
  • The TreeCursor type that acts over injection layers is built into tree-house and is basically free to create - it doesn't require any extra sorting of layers. That type in tree-house uses the TreeCursor type from the C library and so it should be more efficient.
  • The Highlight type comes from tree-house now and is represented as a non-max u32 (enables the null pointer optimization for Option<Highlight>).

... and some other changes that are worth reviewing:

  • The abstraction for overlaying highlights onto the syntax highlights has been rewritten to avoid the dynamic dispatch from syntax::merge in favor of a cursor-like API that matches the highlighter. See the OverlayHighlights and OverlayHighlighter types in the rewritten syntax module.
  • Queries may now specify which captures and predicates/properties they use up-front and deny unknown ones. cargo xtask query-check can now point out unknown properties like (#set! priority ..) and unknown predicates like (#has-ancestor? ..) and has a nice visual output. This especially benefits the indent query which previously used runtime panics for the same effect, see IndentQuery.
Example query compilation errors...
Error: Failed to compile highlights for 'julia'

Caused by:
    unknown predicate #has-ancestor?
      --> 246:4
         |
     245 |   (#any-of? @variable.builtin "begin" "end")
     246 |   (#has-ancestor? @variable.builtin index_expression))
         |    ^^^^^^^^^^^^^^
         |

Error: Failed to compile highlights for 'dockerfile'

Caused by:
    unknown property 'priority'
      --> 54:11
        |
     53 |   (heredoc_line) @string)
     54 |   (#set! "priority" 90))
        |           ^^^^^^^^
        |
Error: Failed to compile highlights for 'glimmer'

Caused by:
    unknown predicate #offset!
      --> 21:16
        |
     20 |   arguments: ((template_string) @glimmer
     21 |               (#offset! @glimmer 0 1 0 -1)))
        |                ^^^^^^^^
        |

Error: Failed to compile indents.scm query for 'opencl'

Caused by:
    unknown predicate #not-knd-eq?
      --> 21:4
        |
     20 |   consequence: (_) @indent
     21 |   (#not-knd-eq? @indent "compound_statement")
        |    ^^^^^^^^^^^^
     22 |   (#set! "scope" "all"))
        |

Fixes #1151
Fixes #3391
Fixes #6491

@the-mikedavis the-mikedavis added A-tree-sitter Area: Tree-sitter A-core Area: Helix core improvements labels Feb 27, 2025
This type also exists on `Editor`. This change brings it to the
`Document` as well because the replacement for `Syntax` in the child
commits will eliminate `Syntax`'s copy of `syn_loader`. `Syntax` will
also be responsible for returning the highlighter and query iterators
(which will borrow the loader), so the loader must be separated from
that type.

In the long run, when we make a larger refactor to have
`Document::apply` be a function of the `Editor` instead of the
`Document`, we will be able to drop this field on `Document` - it is
currently only necessary for `Document::apply`. Once we make that
refactor, we will be able to eliminate the surrounding `Arc` in
`Arc<ArcSwap<syntax::Loader>>` and use the `ArcSwap` directly instead.
@nik-rev
Copy link
Contributor

nik-rev commented Feb 27, 2025

This is fantastic!! It's great to have parameters retain their color even when the function definition goes out of the viewport.

I just rebased #12768 on top of your PR and it works flawlessly. We don't have to inject into every argument anymore, positional injections work!!

image

@pascalkuthe
Copy link
Member

pascalkuthe commented Feb 28, 2025

While writing the code for query precedence in the injection query I was assuming I was fixing niche bugs (and sometimes questioned neither it was worth writing that quite subtle code). Seeing an actual usecase for it so quickly is great!

@hadronized
Copy link
Contributor

Thanks a lot for doing this! I’m going to have a look at whether I can use it in kak-tree-sitter!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-core Area: Helix core improvements A-tree-sitter Area: Tree-sitter
Projects
None yet
4 participants