Highlight ROxygen tags #68

wklimowicz · 2024-02-25T09:34:53Z

Are ROxygen tag highlighting something that can be added? With treesitter disabled they get highlighted from the main neovim runtime:

https://github.com/neovim/neovim/blob/c651a0f643e7bd34eb740069a7b5b8c9f8759ecc/runtime/syntax/r.vim#L96-L159

Treesitter Enabled	Treesitter Disabled

Thanks for your work maintaining this project!

wurli · 2024-04-29T11:56:39Z

+1 for this!

zkamvar · 2024-08-09T14:22:09Z

Another +1!

zkamvar · 2024-08-09T14:38:00Z

I'm not that familiar with treesitter syntax or how it operates, but I wonder if prior art from other languages that have doc comments would help (e.g. rust seems to have the hang of it)

DavisVaughan · 2024-09-16T21:30:56Z

If we were to support this, I do think we'd use the Rust approach linked above

The key points are:

There would still be 1 top level comment node, so you don't have to do anything special for roxygen2 if you just want to ignore all comments
Inside of that comment node, it would possible to extract an optional field("doc", $.doc_content). If that field exists as a child of the comment node, then you can consider the line a roxygen2 comment line
$.doc_content would contain the content of the doc line. i.e. it would drop the leading #' and possibly leading whitespace after the '.
- Possibly it would contain the trailing \n, supposedly this is useful for markdown injection according to the Rust grammar
tree-sitter-r would not be in charge of further parsing things like @param and other tags

I think that is as far as tree-sitter-r would go. We currently don't have any rules in tree-sitter-r that rely on community conventions and external packages, so I am somewhat hesitant to even do this!

I'm not entirely sure, but I think according to tree-sitter/tree-sitter-rust#212 one thing the neovim community could do is create a roxygen2 tree-sitter grammar that is injected in when it sees a doc_content node. Essentially that would set the target language of the content to be roxygen2, which could then parse things like @param and add syntax highlighting for roxygen2 tags like that.

Alternatively you could probably use an all style query to look for a consecutive block of comments that all have a "doc" field. That would basically let you use that field as a marker, and you'd extract all the underlying text from that consecutive range of nodes, and you could probably post process that (I imagine this is already possible today with a all-match based query that looks for #').

If someone who has some neovim developer experience wants to chime in on this plan, that would be helpful! It would be particularly useful to hear if you can already work around this using a match based query to find #' lines, and post process those yourself in some way (that's preferable to me to not rely on a community convention in the grammar), or if this really would be a massive improvement.

Relevant links:

TymekDev · 2024-09-17T09:52:32Z

Hey 👋
I have been playing with the match suggestion in Neovim this morning. It seems that sub-node highlighting is not possible by using tree-sitter queries alone. I don't have anything to back that claim other than my experiments, though.

We currently don't have any rules in tree-sitter-r that rely on community conventions and external packages, so I am somewhat hesitant to even do this!

That's a fair point. I have roxygen2 so ingrained into my brain that I wouldn't even bat an eye!

With that in mind, I don't think that adding a special treatment for #' would make sense for tree-sitter-r. Instead...

create a roxygen2 tree-sitter grammar that is injected

I think this is the way to go. Once tree-sitter-r-roxygen2 grammar exists, all that is left to do is creating queries/injections.scm in tree-sitter-r:

((comment) @injection.content
  (#set! injection.language "r_roxygen2"))

References:

Language injection docs
- injection.combined is worth noting - it could help with implicit title and description and @examples
Injections into HTML (e.g. CSS into <style>) - this is done in nvim-treesitter
Injections into markdown (e.g. code blocks, HTML) - this is done directly in markdown's grammar

I am interested in giving it a go and creating the grammar for roxygen2. I won't give any ETA, but I will report back if it takes off!

However, my experiments didn't go in vain. If you don't want to wait for tree-sitter-r-roxygen2 to come around, then I have a Lua-based solution that at a glance replicates syntax/r.vim highlighting.

Lua-based Solution for Neovim

This approach relies on an autocommand. The upside? It works. The downside? Manual setup¹.

This approach could be expanded to handle things like @importFrom pkg func1 func2 too. For example, it could differentiate the tag, the package name, and function names with different colors. You can go as far as you string-wrangling skills allow you to.

Demo

CleanShot.2024-09-17.at.11.02.36.mp4

Code

Important

Place this code in after/ftplugin/r.lua. It relies on ftplugin to run only in R files.

local get_root = function(bufnr, lang)
  local parser = vim.treesitter.get_parser(bufnr, lang, {})
  local tree = parser:parse()[1]
  return tree:root()
end

local highlight_roxygen2_tags = function(bufnr)
  local query = [[
((comment) @comment.roxygen2
  (#lua-match? @comment.roxygen2 "^#' (@%a+).*$")
  (#gsub! @comment.roxygen2 "^#' (@%a+).*$" "%1"))
]]
  local root = get_root(bufnr, "r")
  local ts_query = vim.treesitter.query.parse("r", query)
  local ns = vim.api.nvim_create_namespace("r.comments.roxygen2")
  for id, node, metadata in ts_query:iter_captures(root, bufnr) do
    if ts_query.captures[id] == "comment.roxygen2" then
      local start_row, _, end_row, _ = vim.treesitter.get_node_range(node)
      local start_col = 3 -- skip leading "#' "
      local end_col = start_col + #metadata[id].text -- add tag length

      vim.highlight.range(bufnr, ns, "@operator", { start_row, start_col }, { end_row, end_col })
    end
  end
end

vim.api.nvim_create_autocmd({ "BufWinEnter", "TextChanged", "TextChangedI" }, {
  desc = "Highlight roxygen2 tags",
  buffer = 0,
  callback = function(args)
    highlight_roxygen2_tags(args.buf)
  end,
})

If this could be done with a query alone, then we would have an out-of-the-box solution and everyone would benefit. ↩

DavisVaughan · 2024-09-17T11:55:57Z

For

((comment) @injection.content
  (#set! injection.language "r_roxygen2"))

it feels wrong to me to assert that every comment now has the language of r_roxygen2. That's what that says, right?

That was why I was suggesting that you'd only do that on (doc_content), even though that does require just a little bit of knowledge about roxygen2 in tree-sitter-r

TymekDev · 2024-09-17T12:45:01Z

Right. It could be changed to:

((comment) @injection.content
  (#match? @injection.content "^#' ")
  (#set! injection.language "r_roxygen2"))

I suppose this comes down to roxygen2 grammar design. For the above to work, the grammar would have to handle #' on its own as the entire (comment) would get re-parsed by it. Otherwise, it would depend on (doc_content).

I just did a quick scan through other "doc" grammars. luadoc grammar doesn't include the leading comment string. On the other hand, jsdoc and phpdoc both include it.

Personally, I don't have a strong opinion which approach should a roxygen2 grammar take. Nor I see any immediate benefits of using one approach over the other.

DavisVaughan · 2024-09-17T13:06:37Z

Personally I think if we can make something like

((comment) @injection.content
  (#match? @injection.content "^#' ")
  (#set! injection.language "r_roxygen2"))

work then that would be greatly preferable to keep tree-sitter-r agnostic to any R packages. What you have there is pretty close to what I thought was possible with the existing setup. And its nice to see that there is some prior art like jsdoc that pretty much does it exactly this way. I see that tree-sitter-javascript even has an injection query for jsdoc (i don't think jsdoc has any marker character like #', so it makes sense that they just mark all comments as possibly jsdoc comments) https://github.com/tree-sitter/tree-sitter-javascript/blob/b6f0624c1447bc209830b195999b78a56b10a579/queries/injections.scm#L20-L23

Note that the exact rule for "this is a roxygen2 comment" is more flexible than just ^#'. It technically allows:

Leading whitespace
One or more leading #, like #####' is valid
The exact rule is here, I imagine that it wouldn't be too hard to find a regex that supports this https://github.com/r-lib/roxygen2/blob/9652d15221109917d46768e836eaf55e33c21633/src/parser2.cpp#L43-L56

DavisVaughan · 2024-09-17T13:22:17Z

In theory the roxygen2 grammar could also give us the ability to mark contents inside an @examples block as R code

((examples_content) @injection.content
  (#set! injection.language "r"))

i.e.

#' @param x a param
#'
#' @examples # the lines after this one are R code if you strip the leading `#'`
#' 1 + 1 
#' fn(
#'  a,
#'  b
#' )

Which could allow highlighting of R code in @examples block to maybe just work?

Also worth looking at injection.combined as an option here https://tree-sitter.github.io/tree-sitter/syntax-highlighting#language-injection. It seems like it would smash all roxygen2 comment lines together into one nested document, and then parse that whole document once, which seems like it would be nice? IIUC that would allow the multi-line @param here to be parsed as 1 node containing the tag and its full description

#' @param x this is a long multiline
#'   description of this param

TymekDev · 2024-09-17T14:15:25Z

i don't think jsdoc has any marker character like #'

The comment block has to start with /**. The way I understand the jsdoc grammar skips leading asterisk and any subsequent whitespace using extras and then looks for a leading /* (making it /**).

It also looks like as soon as it doesn't match /** it falls back to a regular comment.

Note that the exact rule for "this is a roxygen2 comment" is more flexible than just ^#'.

Today I learned! Thanks for pointing that out. I think extras could be explored for stripping that in a similar manner jsdoc removes asterisks and whitespace.

In theory the roxygen2 grammar could also give us the ability to mark contents inside an @examples block as R code

That's what I also thought :-)

injection.combined

Yes! I would definitely take a look on how to efficiently pass things around. My intuition is to try passing continuous block of comments to roxygen2 parser and treating them as one document (similarly to what jsdoc does).

I suppose the next step would be sketching an outline of how the roxygen2 grammar would be structured... :-)

TymekDev · 2024-10-02T20:13:52Z

I have been experimenting with how to approach tree-sitter-roxygen2 and ran into an issue. R uses line comments, and tree-sitter cannot combine multiple nodes into a single capture. This means it's not possible to properly inject the roxygen2 parser into the R parser.

For example, given the following comment:

#' @description Lorem ipsum dolor sit amet,
#' mauris elit justo sociosqu, mauris vel at.
#' At quam, amet ultrices at cras et semper.
#' @noRd

The result would be:

(comment
  (tag
    name: (tag_name)                ; (tag_name) is "@description"
    description: (description)))    ; (description) is "Lorem ipsum dolor sit amet,"
(comment)
(comment)
(comment
  (tag
    name: (tag_name)))              ; (tag_name) is "@noRd"

While this is not a showstopper, it means that tree-sitter-roxygen2 won't be as robust as we imagined, because it will work at the line level.

Unless it is possible to make the R grammar join (comment) nodes from adjacent lines into a single node. However doing, that doesn't feel right to me.

DavisVaughan · 2024-10-03T13:03:01Z

I thought this was what injection.combined was for

https://github.com/tree-sitter/tree-sitter/blob/8500e331ebfd49e66dd935b8a9c7a58aba68af37/docs/section-4-syntax-highlighting.md?plain=1#L370-L371

Does that not combine the sequential comments into 1 "document" that tree-sitter-roxygen2 gets?

jranke · 2024-12-11T15:46:05Z

In theory the roxygen2 grammar could also give us the ability to mark contents inside an @examples block as R code

...

Which could allow highlighting of R code in @examples block to maybe just work?

I think this would be great, as I assume then there would also be a cleaner solution for R-nvim/R.nvim#106 dealing with the fallout of trying to work around such tags not being recognized by tree-sitter-r.

etiennebacher mentioned this issue Sep 18, 2024

Equivalence with lintr etiennebacher/flint#6

Open

85 tasks

EagleoutIce mentioned this issue Nov 21, 2024

Is special support for line directives needed? #160

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highlight ROxygen tags #68

Highlight ROxygen tags #68

wklimowicz commented Feb 25, 2024

wurli commented Apr 29, 2024

zkamvar commented Aug 9, 2024

zkamvar commented Aug 9, 2024

DavisVaughan commented Sep 16, 2024

TymekDev commented Sep 17, 2024 •

edited

Loading

DavisVaughan commented Sep 17, 2024

TymekDev commented Sep 17, 2024

DavisVaughan commented Sep 17, 2024

DavisVaughan commented Sep 17, 2024

TymekDev commented Sep 17, 2024

TymekDev commented Oct 2, 2024

DavisVaughan commented Oct 3, 2024

jranke commented Dec 11, 2024

Highlight ROxygen tags #68

Highlight ROxygen tags #68

Comments

wklimowicz commented Feb 25, 2024

wurli commented Apr 29, 2024

zkamvar commented Aug 9, 2024

zkamvar commented Aug 9, 2024

DavisVaughan commented Sep 16, 2024

TymekDev commented Sep 17, 2024 • edited Loading

Lua-based Solution for Neovim

Demo

Code

Footnotes

DavisVaughan commented Sep 17, 2024

TymekDev commented Sep 17, 2024

DavisVaughan commented Sep 17, 2024

DavisVaughan commented Sep 17, 2024

TymekDev commented Sep 17, 2024

TymekDev commented Oct 2, 2024

DavisVaughan commented Oct 3, 2024

jranke commented Dec 11, 2024

TymekDev commented Sep 17, 2024 •

edited

Loading