Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get/find_transformation with linear transformations #584

Open
mattansb opened this issue Jun 27, 2022 · 10 comments
Open

get/find_transformation with linear transformations #584

mattansb opened this issue Jun 27, 2022 · 10 comments
Labels
Enhancement 💥 Implemented features can be improved or revised

Comments

@mattansb
Copy link
Member

  1. get/find_transformation should not return identity if an unsupported transformation is present.
  2. Should support linear transformations?
m <- lm(I(2 * mpg + 3) ~ hp, mtcars)
insight::find_transformation(m)
#> [1] "identity"

Created on 2022-06-27 by the reprex package (v2.0.1)

@bwiernik
Copy link
Contributor

bwiernik commented Jun 27, 2022

Is there a context where these concerns arise other than I()?

Generally, we should extract the contents of I() and evaluate that as a function with numerical derivatives

strengejacke added a commit that referenced this issue Jun 28, 2022
@strengejacke strengejacke added the Enhancement 💥 Implemented features can be improved or revised label Jul 2, 2022
@mattansb
Copy link
Member Author

mattansb commented Jul 5, 2022

For (1), a user can use some other unsupported function e.g. foo(y) ~ 1, or datawizard::ranktransform(y) ~ ..

And (2) should also work for functions of linear transformation:

@strengejacke
Copy link
Member

strengejacke commented Jul 5, 2022

The problem is how to detect foo()? cbind(x - y) should return "identity", foo() should return "unknown". Are there any other exceptions?

@mattansb
Copy link
Member Author

mattansb commented Jul 5, 2022

The problem is how to detect foo()? cbind(x - y) should return "identity", foo() should return "unknown". Are there any other exceptions?

Perhaps we should just return "identity" if no function or manipulation is detected? All others can be NULL?


Here is some working code to make trans/inversetrans functions for linear transformation functions above:

Define functions
as_linear_transform <- function(x, ...) {
  UseMethod("as_linear_transform")
}

as_linear_inverse <- function(x, ...) {
  UseMethod("as_linear_inverse")
}


as_linear_transform.numeric <- function(x, ...) {
  coefs <- .get_ab(x)
  function(x) {
    (x - coefs["a"]) / coefs["b"]
  }
}


as_linear_inverse.numeric <- function(x, ...) {
  coefs <- .get_ab(x)
  function(x) {
    x * coefs["b"] + coefs["a"]
  }
}



.get_ab <- function(x) {
  attr <- attributes(x)
  attr_names <- names(attr)
  
  if (all(c("center", "scale") %in% attr_names)) {
    a <- attr[["center"]]
    b <- attr[["scale"]]
  } else if (all(c("scaled:center", "scaled:scale") %in% attr_names)) {
    a <- attr[["scaled:center"]]
    b <- attr[["scaled:scale"]]
  } else if (all(c("min_value", "range_difference") %in% attr_names)) {
    a <- attr[["min_value"]]
    b <- attr[["range_difference"]]
    
    if ("to_range" %in% attr_names) {
      to_range <- attr[["to_range"]]
      
      b <- (b / diff(to_range)) 
      a <- a - b * to_range[1]
    }
  }
  
  c(a = a, b = b)
}
library(datawizard)
x <- rnorm(4, 40, 13)

Build trans/inverse functions from linear transformation functions in datawizard

foo <- as_linear_transform(standardize(x))
foo(x)
#> [1] -1.39325099  0.09865699  0.32238805  0.97220595
standardize(x)
#> [1] -1.39325099  0.09865699  0.32238805  0.97220595
#> attr(,"center")
#> [1] 40.5878
#> attr(,"scale")
#> [1] 11.42871
#> attr(,"robust")
#> [1] FALSE


foo <- as_linear_transform(scale(x))
foo(x)
#> [1] -1.39325099  0.09865699  0.32238805  0.97220595
scale(x)
#>             [,1]
#> [1,] -1.39325099
#> [2,]  0.09865699
#> [3,]  0.32238805
#> [4,]  0.97220595
#> attr(,"scaled:center")
#> [1] 40.5878
#> attr(,"scaled:scale")
#> [1] 11.42871


foo <- as_linear_transform(change_scale(x, to = c(3, 14.5), range = c(-30, 200)))
foo(x)
#> [1] 5.733237 6.585766 6.713614 7.084943
change_scale(x, to = c(3, 14.5), range = c(-30, 200))
#> [1] 5.733237 6.585766 6.713614 7.084943
#> attr(,"min_value")
#> [1] -30
#> attr(,"range_difference")
#> [1] 230
#> attr(,"to_range")
#> [1]  3.0 14.5

Build inverse trans/inverse functions from linear transformation functions in datawizard

goo <- as_linear_inverse(center(x))
x
#> [1] 24.66474 41.71532 44.27228 51.69886
goo(center(x))
#> [1] 24.66474 41.71532 44.27228 51.69886
#> attr(,"center")
#> [1] 40.5878
#> attr(,"scale")
#> [1] 1
#> attr(,"robust")
#> [1] FALSE


goo <- as_linear_inverse(normalize(x))
x
#> [1] 24.66474 41.71532 44.27228 51.69886
goo(normalize(x))
#> [1] 24.66474 41.71532 44.27228 51.69886
#> attr(,"include_bounds")
#> [1] TRUE
#> attr(,"min_value")
#> [1] 24.66474
#> attr(,"range_difference")
#> [1] 27.03411


goo <- as_linear_inverse(scale(x))
x
#> [1] 24.66474 41.71532 44.27228 51.69886
goo(scale(x))
#>          [,1]
#> [1,] 24.66474
#> [2,] 41.71532
#> [3,] 44.27228
#> [4,] 51.69886
#> attr(,"scaled:center")
#> [1] 40.5878
#> attr(,"scaled:scale")
#> [1] 11.42871

Created on 2022-07-05 by the reprex package (v2.0.1)

@strengejacke
Copy link
Member

Perhaps we should just return "identity" if no function or manipulation is detected? All others can be NULL?

But cbind() is a function and should not return "unknown".

@bwiernik
Copy link
Contributor

bwiernik commented Jul 5, 2022

I'm not following either of your last comments @mattansb

@mattansb
Copy link
Member Author

mattansb commented Jul 6, 2022

@bwiernik I gave examples of functions the preform simple linear transformations (scale, center, standardize, normalize and change_scale) that could potentially be used in a formula (e.g., scale(y) ~ x) and how to obtain the transformation functions and their inverse (which is what get_transformation() returns, potentially).

@strengejacke
Copy link
Member

I thought when we talk about "transformation" in the meaning of this function, we're talking about a different scale, like normal -> log, or normal -> exp, not standardizing/centering. So you suggest including those as well?

@mattansb
Copy link
Member Author

mattansb commented Jul 6, 2022

Hmmm I think it might be useful; having scale(y) ~ x give a transformation of "identity" might be a little misleading, perhaps?

But if this would be too much work / break some stuff, we can save this issue for a later date (:

@DominiqueMakowski
Copy link
Member

Perhaps we could add a custom output first and then refine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement 💥 Implemented features can be improved or revised
Projects
None yet
Development

No branches or pull requests

4 participants