Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Allows the use of = instead of <- #2521

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

J-Moravec
Copy link

@J-Moravec J-Moravec commented Feb 21, 2024

This is a tiny change that solves #2441.

By adding allow_equal_assignment parameter to the assignment_linter which defaults to FALSE, we give an option to use = instead of <-.

Associated tests are passing. Some other non-associated tests are failing though. But that seems unrelated.

Potential improvements:

  • perhaps a better name would be equality instead of equal? The latter might be already in use.
  • I modified only xpath not trailing_assign_xpath. Since the linter assumes <- instead of = by default, the trailing_assign_xpath might warrant changes as well. But this goes above my current knowledge of how linters actually work.

Review is warranted.

Thanks.

edit: Fixed issues highlighted by github actions (didn't run document() and lintr and force-pushed.
For some reason, test-lint_dir fails for me, but doesn't file on github actions:

Error (test-lint_dir.R:98:3): lint_dir works with specific linters without specifying other arguments
Error: Found unknown arguments in ...: NA.

Error (test-lint_dir.R:107:3): linting empty directory passes
Error: Found unknown arguments in ...: NA.

and lintr::lint_package() complained about not being able to read the .lintr config, but it somehow works now so I have no idea why it failed in the first place.

@J-Moravec J-Moravec changed the title Allows the use = instead of <- Allows the use of = instead of <- Feb 21, 2024
@J-Moravec J-Moravec force-pushed the equal_assignment_lintr branch from 63f122f to e547608 Compare February 21, 2024 23:44
@codecov-commenter
Copy link

codecov-commenter commented Feb 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.95%. Comparing base (f0d9407) to head (a4fb906).
Report is 11 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2521      +/-   ##
==========================================
- Coverage   97.96%   97.95%   -0.02%     
==========================================
  Files         126      126              
  Lines        5760     5770      +10     
==========================================
+ Hits         5643     5652       +9     
- Misses        117      118       +1     
Flag Coverage Δ
97.95% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MichaelChirico
Copy link
Collaborator

Thanks for filing!

WDYT about two new arguments instead:

  • allow_left_assign = TRUE by default
  • allow_equal_assign = FALSE by default

That enables the full generality of use cases here I believe, e.g.

  • (Tidyverse style) All assignment must be with <-
  • (e.g. data.table style) All assignment must be with =: allow_left_assign = FALSE, allow_equal_assign = TRUE
  • (don't care style) Assignment can be with either <- or = (allow_equal_assign = TRUE)

Or, we can do similar to pipe_consistency_linter() and handle this one tripartite argument:

pipe_consistency_linter <- function(pipe = c("auto", "%>%", "|>")) {

E.g. assignment_operator = c("<-", "=", "any")

@MichaelChirico
Copy link
Collaborator

Another major wrinkle about =-assignment style is that authors still may want <- to be allowed for in-expression assignments:

a = 1
b <- 2
if (any(idx <- x > 0)) foo()

b <- 2 should lint, but idx <- x > 0 should not, because that expression is not possible with =.

@J-Moravec
Copy link
Author

Thank you for taking time to review this.

* `allow_left_assign = TRUE` by default

* `allow_equal_assign = FALSE` by default

That enables the full generality of use cases here I believe, e.g.

* (Tidyverse style) All assignment must be with `<-`

* (e.g. data.table style) All assignment must be with `=`: `allow_left_assign = FALSE`, `allow_equal_assign = TRUE`

* (don't care style) Assignment can be with either `<-` or `=` (`allow_equal_assign = TRUE`)

But is there a user case? Aside from people who just like to spread chaos.

In any case, if we want to allow something like that, I would go for your second solution.

E.g. assignment_operator = c("<-", "=", "any")

I will make the assumption that the most common operation will be switching from <- to =. With two parameters, this would mean switching two long arguments. Simple assignment_operator = "=" is simpler to write and more readable IMHO. Translating this to EQ_ASSIGN etc. could be done with a switch statement.

Just modifying the logic will be quite a bit more complicated (compared to the current change) and I will need to learn how to write linter. So this will take some time.

b <- 2 should lint, but idx <- x > 0 should not, because that expression is not possible with =.

That's a feature of =, not a bug. {idx = x} > 0 is IMHO more explicit in what is going on.

@@ -88,7 +91,7 @@ assignment_linter <- function(allow_cascading_assign = TRUE,

xpath <- paste(collapse = " | ", c(
# always block = (NB: the parser differentiates EQ_ASSIGN, EQ_SUB, and EQ_FORMALS)
"//EQ_ASSIGN",
if (allow_equal_assign) "//LEFT_ASSIGN" else "//EQ_ASSIGN",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this also lint data.table code if allow_equal_assign is True? e.g. dt[, x := 42L].

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great catch that made me understand what [text() = '...'] does. The latest commit should fix it. I included a test as well.

@MichaelChirico
Copy link
Collaborator

MichaelChirico commented Feb 24, 2024

That's a feature of =, not a bug. {idx = x} > 0 is IMHO more explicit in what is going on.

I'm not sure, I think it's a matter of opinion. So we could support both approaches.

A really cursory GitHub search suggests this approach to inline assignment is very rare, e.g. 50x less common than assigning inline with <-:

https://github.com/search?q=lang%3AR+%2F%5B%28%5D%5B+%5D*%5Ba-z0-9._%5D%2B%5Cs*%3C-%5B+%5D%2B%5B%5E%3B%5D%2B%5B%7D%5D%2F+-path%3A.Rd&type=code

https://github.com/search?q=lang%3AR+%2F%5B%28%5D%5B%7B%5D%5B+%5D*%5Ba-z0-9._%5D%2B%5Cs*%3D%5B+%5D%2B%5B%5E%3B%5D%2B%5B%7D%5D%2F+-path%3A.Rd&type=code

I'm happy to defer to that to follow-up work we can take on if you're not keen. Either way I'd like this to simmer a while before CRAN release, so I'd defer merging this until after "ongoing" CRAN release.

@J-Moravec
Copy link
Author

J-Moravec commented Feb 25, 2024

A really cursory GitHub search suggests this approach to inline assignment is very rare, e.g. 50x less common than assigning inline with <-:

... links ...

Interesting. If you ask me, I would consider inline <- as just the usage of <-, not functionally different. And if lintr is set to lint <- in favour of =, then that would mean the inline usage of <- in favour of { ... = ... }. And I would dismiss your claim that likely, people are not aware that they can use { ... = ... } in place of ... <- ... for inline usage. But your data clearly shows otherwise. Both knitr and data.table are using <- for inline assignment while preferring = for standard assignment instead. That is sufficient evidence for me that <- for inline assignments can be preferred even though = is used otherwise.

I am happy with this as long as there is a toggle for it.

I'm happy to defer to that to follow-up work we can take on if you're not keen. Either way I'd like this to simmer a while before CRAN release, so I'd defer merging this until after "ongoing" CRAN release.

I am more than happy to do so, and in no rush to get it on CRAN ASAP. Let's get the desired behaviour and implementation right. In fact, I am pleasantly surprised by the attitude to allow parametrization of lints to allow any desired style, since it brings hope that I could implement rules for by banner-style indentation.

Since the behaviours we want now are getting quite complex, I will first implement tests according to desired behaviour and then tweak the code.

Let me reiterate. We have assignment_operator = c("<-", "=", "any") to allow the usage of any assignment operator, and identically we could have inline_assignment_operator = c("<-", "=", "any"). Is there any other behaviour that we would like to enforce?

@MichaelChirico
Copy link
Collaborator

we could have inline_assignment_operator

I think in practice calling if ({x = foo()} > 2) { ... } an "inline assignment" will be a bit harder to implement, since the parser makes no distinction between that and

if ({
  x = 2
} > 2) { ... }

i.e., on the AST it looks exactly like any other = assignment, except for the whitespace, and coming up with a "good" rule on the whitespace for an inline assignment might prove tricky. e.g. we might say "an assignment inside { that only has one expression and starts+ends on the same line", but my spidey-sense says something like {x=1; y=2} will cause headaches.

Since the behaviours we want now are getting quite complex, I will first implement tests according to desired behaviour and then tweak the code.

Perfect.

IMO there are four styles we should accommodate for now. Any further customization will have to come with future FRs.

  1. Tidyverse (and base R) style: <- for all assignments. FWIW, tidy style discourages implicit assignments (hence, implicit_assignment_linter()).
  2. data.table/knitr style: = for "top-level" assignments, <- for implicit assignments to avoid {
  3. Jiří style (sorry, I haven't actually seen this used, so I'm pinning it on you 😃 -- most of the {=} assignments in the search above come in scripts that also use <-, although it appears there are ~400 results that don't use <-): = for all assignments. Implicit assignments must be {-wrapped.
  4. No style. This is basically for users that want the other features of assignment_linter(), like blocking <<-, ->, but aren't picky between <- and =.

Maybe that fits under allow_equal_assign = c("never", "top", "required", "always") (respectively)? "top" could also be called "explicit" (as opposed to implicit)?

cc @AshesITR / @IndrajeetPatil on design here.

it brings hope that I could implement rules for by banner-style indentation.

Feel free to file an issue! Most likely, you'll need to implement it yourself, but I can't see why we wouldn't accept such a PR.

@AshesITR
Copy link
Collaborator

AshesITR commented Feb 26, 2024

Another case against inline assignments using braces is local({x = 42}), which will not bind x to any value in the outer scope.

Maybe what we want is to block <- in inline context (how to detect that?) and parametrize allow_inline_left_assign = TRUE in addition to assignment_operator = c("<-", "=")`?

@J-Moravec
Copy link
Author

Another case against inline assignments using braces is local({x = 42}), which will not bind x to any value in the outer scope.

Sorry, but I don't understand this point. What is the expected and what is the desired behaviour of local({x = 42})?

@J-Moravec
Copy link
Author

Jiří style (sorry, I haven't actually seen this used, so I'm pinning it on you 😃 -- most of the {=} assignments in the search above come in scripts that also use <-, although it appears there are ~400 results that don't use <-): = for all assignments. Implicit assignments must be {-wrapped

There are dozens of us, dozens!

Maybe that fits under allow_equal_assign = c("never", "top", "required", "always") (respectively)? "top" could also be called "explicit" (as opposed to implicit)?

I am not sure I understand the difference between "required" and "always".

allow_equal_assign = c("never", "explicit", "optional", "always") would make more sense to me. But I think the issue from coming with the right words to describe it stems from the fact that this switch is trying to do too much.

And as @AshesITR suggested, splitting it into two switches, one for = and one for inline assignment would be better. If we can implement that. In any case, this would mean rewriting the current logic.

@MichaelChirico
Copy link
Collaborator

Another case against inline assignments using braces is local({x = 42}), which will not bind x to any value in the outer scope.

Sorry, but I don't understand this point. What is the expected and what is the desired behaviour of local({x = 42})?

compare local(x <- 42) and local({x = 42})

@J-Moravec
Copy link
Author

Another case against inline assignments using braces is local({x = 42}), which will not bind x to any value in the outer scope.

Sorry, but I don't understand this point. What is the expected and what is the desired behaviour of local({x = 42})?

compare local(x <- 42) and local({x = 42})

I am comparing and comparing and they produce identical results.

a = local(x <- 42)
exists("x") # false
b = local({x = 42})
exists("x") # false
a == b # TRUE

For a moment I was afraid that local(x <- 42) would assign x to the parent frame, but this isn't true.

@MichaelChirico
Copy link
Collaborator

Oh, that's somewhat surprising to me as well! Thanks for double-checking.

OTOH, that could be specific to local(), though I checked with()/within() work the same. All we need is a function taking an expr as an argument that is not careful about lazy evaluation, and it could make the issue hinted above. I'm less sure about whether such behavior is ever desirable, or just always a bug.

@AshesITR
Copy link
Collaborator

I'm also surprised by the discovery.
So, is there a good way to detect inline assignments at all?

I see two options: 1. not treat "inline" differently or 2. make up some best-effort definition and use that.

For 2. we could for example declare all assignments inline unless they are top-level, top-level within a function body or control statement body, or top-level within a multi-line brace expr.

Examples:

a <- 1 # top-level
function(x) { c <- x } # top-level within function
local({ e <- 2 }) # inline
local({
  f <- 1 # top-level within multi-line brace expr
})
g(h <- 1) # inline
if (i <- 1) {} # inline
if (TRUE) { i <- 1 } # top-level within control statement body
if (TRUE) i <- 1 # top-level within control statement body

@MichaelChirico
Copy link
Collaborator

So, is there a good way to detect inline assignments at all?

implicit_assignment_linter() has a lot of logic to do so already

@J-Moravec
Copy link
Author

J-Moravec commented Feb 29, 2024

So, is there a good way to detect inline assignments at all?

implicit_assignment_linter() has a lot of logic to do so already

Since you are a co-author on that one, do you think that the logic would work for = as well (if it is added to assignments)? Or {x = 1} is indistinguishable from any other call (compared to x <- 1). Also, how does the lintr handle {x <- 1} and (x = 1)?

Link so its easier: https://github.com/r-lib/lintr/blob/HEAD/R/implicit_assignment_linter.R

I see that if we go from the simple: Assignment is either <- or = to a more complex (as you suggested) Assignment can be <-and/or=depending on your choice ofallow_equal_assign = c("never", "explicit", "required", "always"), then there will be some shared code between implicit_assignment_linter()andassignment_linter()`.

Perhaps, if {x = 1} cannot be linted and it cannot be distinguished from any top level calls, than all we need is: assignment_symbol = c("<-", "=", "any") and leave the implicit assignments to the implicit assignment linter.

But otherwise, we are essentially including the functionality of implicit_assignment_linter() in the assignment_linter() which feels counter-intuitive to a certain extent and I am not sure how permitted is the duplication of functionality in here.

Is it even worth it? Wouldn't assignment_symbol = c("<-", "=", "any") be sufficient for this PR and the rest left as big TODO if there is requested functionality? People who use = with implicit <- are likely not currently using lintr.

@AshesITR
Copy link
Collaborator

We've merged linters in the past already where it made sense, so code duplication shouldn't be a huge concern.

@J-Moravec J-Moravec force-pushed the equal_assignment_lintr branch from dc3ab81 to a4fb906 Compare September 29, 2024 19:36
@J-Moravec J-Moravec marked this pull request as draft September 29, 2024 19:38
@J-Moravec J-Moravec changed the title Allows the use of = instead of <- (WIP) Allows the use of = instead of <- Sep 29, 2024
@J-Moravec
Copy link
Author

Rebased to latest main.

Converted to draft.

@J-Moravec
Copy link
Author

With life finally getting out of the way, I am back working on this.

Rereading the whole discussion after some time off, I think the most important part is here (aside of implicit = assignment being hard/impossible)

  1. Tidyverse (and base R) style: <- for all assignments. FWIW, tidy style discourages implicit assignments (hence, implicit_assignment_linter()).

    1. data.table/knitr style: = for "top-level" assignments, <- for implicit assignments to avoid {

    2. Jiří style (sorry, I haven't actually seen this used, so I'm pinning it on you 😃 -- most of the {=} assignments in the search above come in scripts that also use <-, although it appears there are ~400 results that don't use <-): = for all assignments. Implicit assignments must be {-wrapped.

    3. No style. This is basically for users that want the other features of assignment_linter(), like blocking <<-, ->, but aren't picky between <- and =.

With less words:

  1. explicit <- and implicit <-
  2. explicit = and implicit <-
  3. explicit =, no implicit <-
  4. any

This looks like we have explicit <- and =, with a possibility of implicit <-.
Since we have implicit_assignment_linter to handle the implicit <-, we only have explicit <- and = to care about.

So if we handle the explicit case only and leave <- for implicit assignment, we can handle everything with a simpler user interface that won't try to handle too much all at once:

Something like allow_explicit_equal_assignment = c("never", "allowed", "required").

The only issue would be writing the XPath statement to not touch implicit <- when linting for it. I am still not familiar with the Xpath to be confident in this, https://github.com/r-lib/xmlparsedata#readme and removing the line information noise was some help.

@math-mcshane
Copy link

Hello just wanted to comment -- good work all! I have been trying to incorporate lintr into my course's autograder but want to allow both = and <-. I'm currently throwing the baby out with the bath water with assignment_linter = NULL. Excited for this feature to be included.

Discussion for those interested as well as a competing style guide (MLR3 style guide) that contradicts tidyverse style guide on the <- and =.

@MichaelChirico
Copy link
Collaborator

Finally coming back here myself.

@J-Moravec OK yes, I agree about implicit assignments being a bit of a distraction for the main feature here (top-level assignment style). I think we should focus on getting a good rule in place for top-level assignments, and then move on to fine-tuning the rules in further follow-ups later.

For the implementation, I would start by writing good test cases. Usually I find the proper XPath falls out from there.

Once you have test cases written, push them to this branch and if you're still stuck, please ping me and I'll pitch in.

Thank you!

@J-Moravec
Copy link
Author

@MichaelChirico Yeah, the complexity started to increase when I tried to consider implicit assignments as well.

Sadly, my free time has disappeared since my baby grew into a toddler and the little guy likes to go sleep at 10pm.

@MichaelChirico
Copy link
Collaborator

that sounds all too familiar 😭

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants