Writing robust, performant linters

Accumulated knowledge on best practice writing linters

The {lintr} codebase has a lot of accumulated knowledge about how to write robust and fast linters. This Wiki exists as a repository for tidbits on these topics.

It exists as a Wiki to make editing it open to all and with low overhead.

Tips for robustness

When writing a test for logical constants like TRUE or FALSE, if you want the condition to match the shorthands T and F, note that the former is a NUM_CONST while the latter is a SYMBOL (c.f. getParseData(parse(text = "TRUE; T"))).
Keep pipes (%>%, |>) in mind when writing lints based on positional logic (e.g. if it's a lint for the 2nd argument to meet some condition, that will usually become the 1st argument inside a pipe chain).
The magrittr pipe %>% and the "native pipe" |> show up differently on the parse tree: SPECIAL and PIPE, respectively. Note that all infix operators (e.g. %%, %in%, %*%) show up as SPECIAL, so you'll need to test the text() as well for magrittr pipes.
Often, it's better to anchor on EQ_SUB instead of SYMBOL_SUB when writing conditions around named arguments. The latter need not be present in all cases, e.g. in foo("a" = 1), which is valid R code, the parse tree will have a STR_CONST for "a", not a SYMBOL_SUB.
Be wary of * searches like preceding-sibling::*[1]. Are you sure everything counts? One common mistake is to include <COMMENT> nodes here, so the XPath lands on a comment instead of the intended expression. Exclude such comments like preceding-sibling::*[not(self::COMMENT)][1]. Be careful! The parser allows comments to show up basically anywhere!
Also be wary of expr searches like preceding-sibling::expr[1]. Are = assignment expressions excluded intentionally? Note that depending on the R version, the expression a = 1 will not show up like <expr><expr><SYMBOL>a</SYMBOL></expr><EQ_ASSIGN>=</EQ_ASSIGN><expr><NUM_CONST>1</NUM_CONST></expr></expr> (if we swap EQ_ASSIGN to LEFT_ASSIGN, that's how a <- 1 would appear). The outermost <expr> may be <equal_assign> or <expr_or_assign_or_help> instead. preceding-sibling::expr[1] will thus skip such an assignment, which is often a mistake.

for loops are a bit of a trap: they appear quite differently on the AST than do similar constructs like while() and if(); see https://github.com/r-lib/lintr/issues/2564#issuecomment-2675831586. Specifically, the AST for a simple for (x in 1:10) 1 looks like:

<expr>
  <FOR>for</FOR>
  <forcond>
    <OP-LEFT-PAREN>(</OP-LEFT-PAREN>
    <SYMBOL>a</SYMBOL>
    <IN>in</IN>
    <expr>
      <expr><NUM_CONST>1</NUM_CONST></expr>
      <OP-COLON>:</OP-COLON>
      <expr><NUM_CONST>10</NUM_CONST></expr>
    </expr>
    <OP-RIGHT-PAREN>)</OP-RIGHT-PAREN>
  </forcond>
  <expr>
    <NUM_CONST>1</NUM_CONST>
  </expr>
</expr>

S4 slots extractions are a lot like dollar extractions in the parse tree (x$y vs. x@y), except that the RHS of @ is always a SLOT node, whereas the RHS of $ is a SYMBOL or SYMBOL_FUNCTION_CALL. That also means we need to take care to distinguish a name from a call on the RHS of @ (x@y vs. x@y()) based on whether there is (, whereas for $ we can just rely on the node name.
Some linters work like "Compare expression 1 and expression 2; lint if they match [perhaps with other conditions]", for example, regex_subset_linter() looks for <expr1>[grep(pattern, <expr2>)] and lints only if <expr1> and <expr2> match. XPath basically works here, since = applied to two nodes is evaluated based on the string value of the nodes (See the XPath standard). But beware comments! You might want to exclude and child <COMMENT> nodes before comparing.

Tips for performance

Avoid //* XPaths like the plague! At least in the current {xml2}, it is almost always slower than alternatives. A good example is https://github.com/r-lib/lintr/pull/2025, which shows a 3x speed-up from avoiding //* even though the replacement is a long, inefficient-seeming chain of //A[expr] | //B[expr]-style repetitive expressions.
Similarly, avoid //expr XPaths. See https://github.com/r-lib/lintr/issues/1358 -- more than 1/3 of all nodes are <expr>, so //expr only eliminates a relatively small portion of the parse tree. The more specific a node you can anchor on, the better, but the difference among nodes besides <expr> is not as important, so err on the side of readability/comprehensibility.
If you use //SYMBOL_FUNCTION_CALL as an entry point, use the xml_find_function_calls() helper instead, because it returns cached results much faster, especially when testing for multiple options of text() = 'foo'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Writing robust, performant linters

Accumulated knowledge on best practice writing linters

Tips for robustness

Tips for performance

Uh oh!

Clone this wiki locally