Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what *is* a vector? #1955

Closed
JosiahParry opened this issue Oct 22, 2024 · 5 comments
Closed

what *is* a vector? #1955

JosiahParry opened this issue Oct 22, 2024 · 5 comments

Comments

@JosiahParry
Copy link
Contributor

This is somewhat of a philosophical question but with real consequences—so I apologize for it winding and curving!

TL;DR

The nb class is a list with attributes and no explicit list class.

This is how the following packages see it

package list vector
rlang
base
vctrs

Background

One thing that has been bothering me since 2021 is that the nb and listw classes from the spdep cannot be easily integrated into the tidyverse.

The nb class object is a ragged array stored in a list. A list is a vector and thus can work with vctrs and the tidyverse in general. However, the nb class object does not have the list class explicitly added. There is disagreement across base R, rlang, and vctrs about what constitutes a vector and a list.

Motivation

The rcrd class from vctrs provides a nice opportunity to be able to embed the listw class into the tidyverse workflow in a much more seamless way than has been possible in the past.

I am quite interested in thinking through how I can make spatial statistics more accessible to the R ecosystem and this is a big part of it. I have a package sfdep which provides tidyverse compatibility by way of partitioning these two component lists neighbours and weights as two separate columns in a dataframe. Ideally, it would be one as it can become out of sync.

Question

What constitutes a list and a vector in vctrs and should there be agreement between rlang and vctrs as to what this is?

Additionally, do you all have guidance as how one can address this? FWIW, I am not the author or maintainer of {spdep} and adding the list subclass is out of question as demonstrated in r-spatial/spdep#59.

Reprex

library(spdep)
library(vctrs)

# create listw object 
nb <- cell2nb(10, 10)
listw <- nb2listw(nb)

# try and create a record
x <- new_rcrd(listw, class = "swm_rcrd")
#> Error in `df_list()`:
#> ! `neighbours` must be a vector, not a <nb> object.
# according to {rlang} the nb object is a vector
rlang::is_list(listw$neighbours)
#> [1] TRUE
rlang::is_vector(listw$neighbours)
#> [1] TRUE
# according to vctrs it is not a list
vctrs::obj_is_list(listw$neighbours)
#> [1] FALSE
# according to vctrs it is not
vctrs::obj_is_vector(listw$neighbours)
#> [1] FALSE
# base R says it is a list
typeof(listw$neighbours)
#> [1] "list"
# but base R also says it is not a vector
# is this because it is missing the explicit class??
is.vector(nb)
#> [1] FALSE
# according to base R it is _not_ a vector
is.vector(list())
#> [1] TRUE
# adding the explicit list class 
class(listw$neighbours) <- c("nb", "list")

# this works 
x <- new_rcrd(listw, class = "swm_rcrd")

format.swm_rcrd <- function(x, ...) {
  nbs <- field(x, "neighbours")
  card <- spdep::card(nbs)
  out <- paste("(", vapply(nbs, toString, character(1)), ")", sep = "")
  out[which(card == 0)] <- NA
  out
}

tibble::tibble(swm = x)
#> # A tibble: 100 × 1
#>            swm
#>     <swm_rcrd>
#>  1     (2, 11)
#>  2  (1, 3, 12)
#>  3  (2, 4, 13)
#>  4  (3, 5, 14)
#>  5  (4, 6, 15)
#>  6  (5, 7, 16)
#>  7  (6, 8, 17)
#>  8  (7, 9, 18)
#>  9 (8, 10, 19)
#> 10     (9, 20)
#> # ℹ 90 more rows

Created on 2024-10-22 with reprex v2.1.0

@JosiahParry JosiahParry changed the title what _is_ a vector? what *is* a vector? Oct 22, 2024
@lionel-
Copy link
Member

lionel- commented Oct 23, 2024

There is the storage type and there is the semantic type (a combination of interface and semantics). rlang is about the storage type, vctrs is about semantics.

We've decided that S3 subclasses must explicitly inherit from a base vector/list class to be considered as such, even if they have vector/list storage. For instance, in the vctrs worldview an S3 model is a scalar and not a list, even though it has list storage.

@DavisVaughan
Copy link
Member

DavisVaughan commented Oct 23, 2024

FWIW is.vector() is incredibly low level and is probably not a good thing to consider in this conversation:

is.vector(x) returns TRUE if x is a vector of the specified mode having no attributes other than names.

@DavisVaughan
Copy link
Member

DavisVaughan commented Oct 23, 2024

?vctrs::obj_is_list() does a good job explaining the 2 rules that allow an object to be treated as a list in vctrs, x is a list if:

  • x is a bare list with no class.
  • x is a list explicitly inheriting from "list".

As Lionel said, this distinction allows us to say that output from lm() is considered a scalar object rather than a vector-like list object. Because its class is just "lm".

But a vctrs::list_of() is considered a vector-like list, because its class structure is c("vctrs_list_of", "vctrs_vctr", "list")


This rule about what an explicit "list" class means runs very deep. If you have a "list" class on your object, we are going to try and index into it with VECTOR_ELT() or VECTOR_PTR_RO() at the C level, so it sure better be backed by a VECSXP.

@DavisVaughan
Copy link
Member

DavisVaughan commented Oct 23, 2024

?vctrs::obj_is_vector() similarly does a good job of describing what makes an object a vector in vctrs
https://vctrs.r-lib.org/reference/vector-checks.html#vectors-and-scalars

In particular, a good example here is the vctrs_rcrd type.

  • It uses list storage to hold n vectors of equal size
  • It is not considered a list, because the class structure is c("vctrs_rcrd", "vctrs_vctr")
  • It is considered a vector, because we provide a vec_proxy() method that returns a data frame

Another good example are the Duration and Interval and Period S4 classes from lubridate:

  • They use S4 to hold n vectors of equal size
  • It is not considered a list, because the class structure is "Period"
  • It is considered a vector, because we provide a vec_proxy() method that returns a data frame

@JosiahParry
Copy link
Contributor Author

JosiahParry commented Oct 23, 2024

Thank you all for the very clear and thoughtful responses! Following the details section in Vector Checks (which should be more discoverable, imo its really great writing!) this issue can be addressed by simply adding a new vec_proxy() method.

Overall what I take away is that the comparison between rlang and vctrs should be between the _bare_ functions in rlang. vctrs permits vector "status" to be obtained through other s3 generic methods (notably vec_proxy()).

library(spdep)
library(vctrs)

# create listw object 
nb <- cell2nb(10, 10)
listw <- nb2listw(nb)

# these tests should be the same
rlang::is_bare_list(nb)
#> [1] FALSE

rlang::is_bare_vector(nb)
#> [1] FALSE

vctrs::obj_is_list(nb)
#> [1] FALSE

vctrs::obj_is_vector(nb)
#> [1] FALSE

# tell {vctrs} that nb _is_ a vector 
vec_proxy.nb <- function(x, ...) {
  unclass(x)
}

# do these tests with {vctrs} again and see it is now vector
# but still not list
vctrs::obj_is_list(nb)
#> [1] FALSE

vctrs::obj_is_vector(nb)
#> [1] TRUE

# give a format method for the record
format.swm_rcrd <- function(x, ...) {
  nbs <- field(x, "neighbours")
  card <- spdep::card(nbs)
  out <- paste("(", vapply(nbs, toString, character(1)), ")", sep = "")
  out[which(card == 0)] <- NA
  out
}

# try and create a record
x <- new_rcrd(listw, class = "swm_rcrd")

head(x)
#> <swm_rcrd[6]>
#> [1] (2, 11)    (1, 3, 12) (2, 4, 13) (3, 5, 14) (4, 6, 15) (5, 7, 16)

Created on 2024-10-23 with reprex v2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants