Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlight ROxygen tags #68

Open
wklimowicz opened this issue Feb 25, 2024 · 13 comments
Open

Highlight ROxygen tags #68

wklimowicz opened this issue Feb 25, 2024 · 13 comments

Comments

@wklimowicz
Copy link

Are ROxygen tag highlighting something that can be added? With treesitter disabled they get highlighted from the main neovim runtime:

https://github.com/neovim/neovim/blob/c651a0f643e7bd34eb740069a7b5b8c9f8759ecc/runtime/syntax/r.vim#L96-L159

Treesitter Enabled Treesitter Disabled
image image

Thanks for your work maintaining this project!

@wurli
Copy link

wurli commented Apr 29, 2024

+1 for this!

@zkamvar
Copy link

zkamvar commented Aug 9, 2024

Another +1!

@zkamvar
Copy link

zkamvar commented Aug 9, 2024

I'm not that familiar with treesitter syntax or how it operates, but I wonder if prior art from other languages that have doc comments would help (e.g. rust seems to have the hang of it)

@DavisVaughan
Copy link
Member

If we were to support this, I do think we'd use the Rust approach linked above

The key points are:

  • There would still be 1 top level comment node, so you don't have to do anything special for roxygen2 if you just want to ignore all comments
  • Inside of that comment node, it would possible to extract an optional field("doc", $.doc_content). If that field exists as a child of the comment node, then you can consider the line a roxygen2 comment line
  • $.doc_content would contain the content of the doc line. i.e. it would drop the leading #' and possibly leading whitespace after the '.
    • Possibly it would contain the trailing \n, supposedly this is useful for markdown injection according to the Rust grammar
  • tree-sitter-r would not be in charge of further parsing things like @param and other tags

I think that is as far as tree-sitter-r would go. We currently don't have any rules in tree-sitter-r that rely on community conventions and external packages, so I am somewhat hesitant to even do this!

I'm not entirely sure, but I think according to tree-sitter/tree-sitter-rust#212 one thing the neovim community could do is create a roxygen2 tree-sitter grammar that is injected in when it sees a doc_content node. Essentially that would set the target language of the content to be roxygen2, which could then parse things like @param and add syntax highlighting for roxygen2 tags like that.

Alternatively you could probably use an all style query to look for a consecutive block of comments that all have a "doc" field. That would basically let you use that field as a marker, and you'd extract all the underlying text from that consecutive range of nodes, and you could probably post process that (I imagine this is already possible today with a all-match based query that looks for #').

If someone who has some neovim developer experience wants to chime in on this plan, that would be helpful! It would be particularly useful to hear if you can already work around this using a match based query to find #' lines, and post process those yourself in some way (that's preferable to me to not rely on a community convention in the grammar), or if this really would be a massive improvement.

Relevant links:

@TymekDev
Copy link

TymekDev commented Sep 17, 2024

Hey 👋
I have been playing with the match suggestion in Neovim this morning. It seems that sub-node highlighting is not possible by using tree-sitter queries alone. I don't have anything to back that claim other than my experiments, though.

We currently don't have any rules in tree-sitter-r that rely on community conventions and external packages, so I am somewhat hesitant to even do this!

That's a fair point. I have roxygen2 so ingrained into my brain that I wouldn't even bat an eye!

With that in mind, I don't think that adding a special treatment for #' would make sense for tree-sitter-r. Instead...

create a roxygen2 tree-sitter grammar that is injected

I think this is the way to go. Once tree-sitter-r-roxygen2 grammar exists, all that is left to do is creating queries/injections.scm in tree-sitter-r:

((comment) @injection.content
  (#set! injection.language "r_roxygen2"))

References:

I am interested in giving it a go and creating the grammar for roxygen2. I won't give any ETA, but I will report back if it takes off!

However, my experiments didn't go in vain. If you don't want to wait for tree-sitter-r-roxygen2 to come around, then I have a Lua-based solution that at a glance replicates syntax/r.vim highlighting.

Lua-based Solution for Neovim

This approach relies on an autocommand. The upside? It works. The downside? Manual setup1.

This approach could be expanded to handle things like @importFrom pkg func1 func2 too. For example, it could differentiate the tag, the package name, and function names with different colors. You can go as far as you string-wrangling skills allow you to.

Demo

CleanShot.2024-09-17.at.11.02.36.mp4

Code

Important

Place this code in after/ftplugin/r.lua. It relies on ftplugin to run only in R files.

local get_root = function(bufnr, lang)
  local parser = vim.treesitter.get_parser(bufnr, lang, {})
  local tree = parser:parse()[1]
  return tree:root()
end

local highlight_roxygen2_tags = function(bufnr)
  local query = [[
((comment) @comment.roxygen2
  (#lua-match? @comment.roxygen2 "^#' (@%a+).*$")
  (#gsub! @comment.roxygen2 "^#' (@%a+).*$" "%1"))
]]
  local root = get_root(bufnr, "r")
  local ts_query = vim.treesitter.query.parse("r", query)
  local ns = vim.api.nvim_create_namespace("r.comments.roxygen2")
  for id, node, metadata in ts_query:iter_captures(root, bufnr) do
    if ts_query.captures[id] == "comment.roxygen2" then
      local start_row, _, end_row, _ = vim.treesitter.get_node_range(node)
      local start_col = 3 -- skip leading "#' "
      local end_col = start_col + #metadata[id].text -- add tag length

      vim.highlight.range(bufnr, ns, "@operator", { start_row, start_col }, { end_row, end_col })
    end
  end
end

vim.api.nvim_create_autocmd({ "BufWinEnter", "TextChanged", "TextChangedI" }, {
  desc = "Highlight roxygen2 tags",
  buffer = 0,
  callback = function(args)
    highlight_roxygen2_tags(args.buf)
  end,
})

Footnotes

  1. If this could be done with a query alone, then we would have an out-of-the-box solution and everyone would benefit.

@DavisVaughan
Copy link
Member

For

((comment) @injection.content
  (#set! injection.language "r_roxygen2"))

it feels wrong to me to assert that every comment now has the language of r_roxygen2. That's what that says, right?

That was why I was suggesting that you'd only do that on (doc_content), even though that does require just a little bit of knowledge about roxygen2 in tree-sitter-r

@TymekDev
Copy link

Right. It could be changed to:

((comment) @injection.content
  (#match? @injection.content "^#' ")
  (#set! injection.language "r_roxygen2"))

I suppose this comes down to roxygen2 grammar design. For the above to work, the grammar would have to handle #' on its own as the entire (comment) would get re-parsed by it. Otherwise, it would depend on (doc_content).

I just did a quick scan through other "doc" grammars. luadoc grammar doesn't include the leading comment string. On the other hand, jsdoc and phpdoc both include it.

Personally, I don't have a strong opinion which approach should a roxygen2 grammar take. Nor I see any immediate benefits of using one approach over the other.

@DavisVaughan
Copy link
Member

Personally I think if we can make something like

((comment) @injection.content
  (#match? @injection.content "^#' ")
  (#set! injection.language "r_roxygen2"))

work then that would be greatly preferable to keep tree-sitter-r agnostic to any R packages. What you have there is pretty close to what I thought was possible with the existing setup. And its nice to see that there is some prior art like jsdoc that pretty much does it exactly this way. I see that tree-sitter-javascript even has an injection query for jsdoc (i don't think jsdoc has any marker character like #', so it makes sense that they just mark all comments as possibly jsdoc comments) https://github.com/tree-sitter/tree-sitter-javascript/blob/b6f0624c1447bc209830b195999b78a56b10a579/queries/injections.scm#L20-L23

Note that the exact rule for "this is a roxygen2 comment" is more flexible than just ^#'. It technically allows:

@DavisVaughan
Copy link
Member

In theory the roxygen2 grammar could also give us the ability to mark contents inside an @examples block as R code

((examples_content) @injection.content
  (#set! injection.language "r"))

i.e.

#' @param x a param
#'
#' @examples # the lines after this one are R code if you strip the leading `#'`
#' 1 + 1 
#' fn(
#'  a,
#'  b
#' )

Which could allow highlighting of R code in @examples block to maybe just work?


Also worth looking at injection.combined as an option here https://tree-sitter.github.io/tree-sitter/syntax-highlighting#language-injection. It seems like it would smash all roxygen2 comment lines together into one nested document, and then parse that whole document once, which seems like it would be nice? IIUC that would allow the multi-line @param here to be parsed as 1 node containing the tag and its full description

#' @param x this is a long multiline
#'   description of this param

@TymekDev
Copy link

i don't think jsdoc has any marker character like #'

The comment block has to start with /**. The way I understand the jsdoc grammar skips leading asterisk and any subsequent whitespace using extras and then looks for a leading /* (making it /**).

It also looks like as soon as it doesn't match /** it falls back to a regular comment.

Note that the exact rule for "this is a roxygen2 comment" is more flexible than just ^#'.

Today I learned! Thanks for pointing that out. I think extras could be explored for stripping that in a similar manner jsdoc removes asterisks and whitespace.

In theory the roxygen2 grammar could also give us the ability to mark contents inside an @examples block as R code

That's what I also thought :-)

injection.combined

Yes! I would definitely take a look on how to efficiently pass things around. My intuition is to try passing continuous block of comments to roxygen2 parser and treating them as one document (similarly to what jsdoc does).


I suppose the next step would be sketching an outline of how the roxygen2 grammar would be structured... :-)

@TymekDev
Copy link

TymekDev commented Oct 2, 2024

I have been experimenting with how to approach tree-sitter-roxygen2 and ran into an issue. R uses line comments, and tree-sitter cannot combine multiple nodes into a single capture. This means it's not possible to properly inject the roxygen2 parser into the R parser.

For example, given the following comment:

#' @description Lorem ipsum dolor sit amet,
#' mauris elit justo sociosqu, mauris vel at.
#' At quam, amet ultrices at cras et semper.
#' @noRd

The result would be:

(comment
  (tag
    name: (tag_name)                ; (tag_name) is "@description"
    description: (description)))    ; (description) is "Lorem ipsum dolor sit amet,"
(comment)
(comment)
(comment
  (tag
    name: (tag_name)))              ; (tag_name) is "@noRd"

While this is not a showstopper, it means that tree-sitter-roxygen2 won't be as robust as we imagined, because it will work at the line level.

Unless it is possible to make the R grammar join (comment) nodes from adjacent lines into a single node. However doing, that doesn't feel right to me.

@DavisVaughan
Copy link
Member

I thought this was what injection.combined was for

https://github.com/tree-sitter/tree-sitter/blob/8500e331ebfd49e66dd935b8a9c7a58aba68af37/docs/section-4-syntax-highlighting.md?plain=1#L370-L371

Does that not combine the sequential comments into 1 "document" that tree-sitter-roxygen2 gets?

@jranke
Copy link

jranke commented Dec 11, 2024

In theory the roxygen2 grammar could also give us the ability to mark contents inside an @examples block as R code

...

Which could allow highlighting of R code in @examples block to maybe just work?

I think this would be great, as I assume then there would also be a cleaner solution for R-nvim/R.nvim#106 dealing with the fallout of trying to work around such tags not being recognized by tree-sitter-r.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants