Skip to content

Writing robust, performant linters

Michael Chirico edited this page Feb 22, 2025 · 10 revisions

Accumulated knowledge on best practice writing linters

The {lintr} codebase has a lot of accumulated knowledge about how to write robust and fast linters. This Wiki exists as a repository for tidbits on these topics.

It exists as a Wiki to make editing it open to all and with low overhead.

Tips for robustness

  • When writing a test for logical constants like TRUE or FALSE, if you want the condition to match the shorthands T and F, note that the former is a NUM_CONST while the latter is a SYMBOL (c.f. getParseData(parse(text = "TRUE; T"))).

  • Keep pipes (%>%, |>) in mind when writing lints based on positional logic (e.g. if it's a lint for the 2nd argument to meet some condition, that will usually become the 1st argument inside a pipe chain).

  • The magrittr pipe %>% and the "native pipe" |> show up differently on the parse tree: SPECIAL and PIPE, respectively. Note that all infix operators (e.g. %%, %in%, %*%) show up as SPECIAL, so you'll need to test the text() as well for magrittr pipes.

  • Often, it's better to anchor on EQ_SUB instead of SYMBOL_SUB when writing conditions around named arguments. The latter need not be present in all cases, e.g. in foo("a" = 1), which is valid R code, the parse tree will have a STR_CONST for "a", not a SYMBOL_SUB.

  • Be wary of * searches like preceding-sibling::*[1]. Are you sure everything counts? One common mistake is to include <COMMENT> nodes here, so the XPath lands on a comment instead of the intended expression. Exclude such comments like preceding-sibling::*[not(self::COMMENT)][1]. Be careful! The parser allows comments to show up basically anywhere!

  • Also be wary of expr searches like preceding-sibling::expr[1]. Are = assignment expressions excluded intentionally? Note that depending on the R version, the expression a = 1 will not show up like <expr><expr><SYMBOL>a</SYMBOL></expr><EQ_ASSIGN>=</EQ_ASSIGN><expr><NUM_CONST>1</NUM_CONST></expr></expr> (if we swap EQ_ASSIGN to LEFT_ASSIGN, that's how a <- 1 would appear). The outermost <expr> may be <equal_assign> or <expr_or_assign_or_help> instead. preceding-sibling::expr[1] will thus skip such an assignment, which is often a mistake.

  • for loops are a bit of a trap: they appear quite differently on the AST than do similar constructs like while() and if(); see https://github.com/r-lib/lintr/issues/2564#issuecomment-2675831586. Specifically, the AST for a simple for (x in 1:10) 1 looks like:

    <expr>
      <FOR>for</FOR>
      <forcond>
        <OP-LEFT-PAREN>(</OP-LEFT-PAREN>
        <SYMBOL>a</SYMBOL>
        <IN>in</IN>
        <expr>
          <expr><NUM_CONST>1</NUM_CONST></expr>
          <OP-COLON>:</OP-COLON>
          <expr><NUM_CONST>10</NUM_CONST></expr>
        </expr>
        <OP-RIGHT-PAREN>)</OP-RIGHT-PAREN>
      </forcond>
      <expr>
        <NUM_CONST>1</NUM_CONST>
      </expr>
    </expr>

Tips for performance

  • Avoid //* XPaths like the plague! At least in the current {xml2}, it is almost always slower than alternatives. A good example is https://github.com/r-lib/lintr/pull/2025, which shows a 3x speed-up from avoiding //* even though the replacement is a long, inefficient-seeming chain of //A[expr] | //B[expr]-style repetitive expressions.
  • Similarly, avoid //expr XPaths. See https://github.com/r-lib/lintr/issues/1358 -- more than 1/3 of all nodes are <expr>, so //expr only eliminates a relatively small portion of the parse tree. The more specific a node you can anchor on, the better, but the difference among nodes besides <expr> is not as important, so err on the side of readability/comprehensibility.
  • If you use //SYMBOL_FUNCTION_CALL as an entry point, use the xml_find_function_calls() helper instead, because it returns cached results much faster, especially when testing for multiple options of text() = 'foo'.