-
-
Notifications
You must be signed in to change notification settings - Fork 98
Expand check_heterogeneity_bias()
's output
#812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Okay, I like this a lot: library(performance)
mlmRev::egsingle |>
check_group_variation(select = c("lowinc", "female", "math"),
by = c("schoolid", "childid"))
#> Check group variation
#>
#> group | variable | type
#> -----------------------------
#> schoolid | lowinc | between
#> schoolid | female | within
#> schoolid | math | both
#> childid | lowinc | between
#> childid | female | between
#> childid | math | both
mlmRev::egsingle |>
check_group_variation(select = c("lowinc", "female", "math"),
by = c("schoolid", "childid"),
include_by = TRUE)
#> Check group variation
#>
#> group | variable | type
#> -----------------------------
#> schoolid | childid | nested
#> schoolid | lowinc | between
#> schoolid | female | within
#> schoolid | math | both
#> childid | schoolid | between
#> childid | lowinc | between
#> childid | female | between
#> childid | math | both
dat <- data.frame(
id = rep(letters, each = 2),
constant = "a",
between_num = rep(rnorm(26), each = 2),
within_num = rep(rnorm(2), times = 26),
both_num = rnorm(52),
between_fac = rep(LETTERS, each = 2),
within_fac = rep(LETTERS[1:2], times = 26),
both_fac = sample(LETTERS[1:5], size = 52, replace = TRUE)
)
dat |>
check_group_variation(by = "id")
#> Check id variation
#>
#> variable | type
#> ---------------------
#> constant |
#> between_num | between
#> within_num | within
#> both_num | both
#> between_fac | between
#> within_fac | within
#> both_fac | both Created on 2025-05-07 with reprex v2.1.1 I re-wrote a lot of the docs, to explain what is going on.
|
Great! Printing should be fixed. You need to provide a list of data frames, so I just added a code to I don't think this function will supersede |
Yes, getting group means is harder for complex nested designs, but isn't library(performance)
library(dplyr)
egsingle <- mlmRev::egsingle |>
group_by(childid) |>
filter(n() == 6L)
egsingle |>
check_group_variation(select = c("lowinc", "female", "year", "math"),
by = c("schoolid", "childid"))
#> Check group variation
#>
#> group | variable | type
#> -----------------------------
#> schoolid | lowinc | between
#> schoolid | female | both
#> schoolid | year | within
#> schoolid | math | both
#> childid | lowinc | between
#> childid | female | between
#> childid | year | within
#> childid | math | both
egsingle |>
check_heterogeneity_bias(select = c("lowinc", "female", "year", "math"),
by = c("schoolid", "childid"), nested = TRUE)
#> Possible heterogeneity bias due to following predictors: year, math Created on 2025-05-08 with reprex v2.1.1 |
No, only for cross-classified, but not for nested, see https://github.com/easystats/datawizard/blob/4d78084f0676e1df60dc9eaf20298a2d810cb645/R/demean.R#L445-L477 See docs and references in https://easystats.github.io/datawizard/reference/demean.html |
# Nested
demean1 <- mlmRev::egsingle |>
datawizard::demean("math", by = c("schoolid/childid"), append = FALSE)
head(demean1)
#> math_schoolid_between math_childid_between math_within
#> 1 0.1984639 1.328203 -0.3806667
#> 2 0.1984639 1.328203 -0.3926667
#> 3 0.1984639 1.328203 0.7733333
#> 4 0.1984639 1.340136 -2.8416000
#> 5 0.1984639 1.340136 -1.0996000
#> 6 0.1984639 1.340136 0.8914000
# Only by the lower elvel grouping variable
demean2 <- mlmRev::egsingle |>
datawizard::demean("math", by = c("childid"), append = FALSE)
head(demean2)
#> math_between math_within
#> 1 1.526667 -0.3806667
#> 2 1.526667 -0.3926667
#> 3 1.526667 0.7733333
#> 4 1.538600 -2.8416000
#> 5 1.538600 -1.0996000
#> 6 1.538600 0.8914000
# These are the same
all(demean1$math_within == demean2$math_within)
#> [1] TRUE Created on 2025-05-08 with reprex v2.1.1 In other words, |
I propose soft deprecating |
@strengejacke I'm reading Bell and Jones (2015) and Lee (2011) re: heterogeneity bias, and they define heterogeneity bias as cases where a within group variable also varies between groups (or: x is correlated with the random intercepts of the groups). But (The opposite is not true - any variable that might lead to heterogeneity bias will be flagged. So what I'm saying is that |
P.S.: which publication is Lee 2011? |
Sorry, Li 2011, here>>.
Yeah. Should be easy to replace all existing uses of it in easystats. |
…performance into strengejacke/issue810
Should we flag those variables with |
@mattansb wdyt about the "new" print / type-column? (indicating an extra nested, because "nested" can be between or both) egsingle <- data.frame(
schoolid = factor(rep(c("2020", "2820"), times = c(18, 6))),
lowinc = rep(c(TRUE, FALSE), times = c(18, 6)),
childid = factor(rep(
c("288643371", "292020281", "292020361", "295341521"),
each = 6
)),
female = rep(c(TRUE, FALSE), each = 12),
year = rep(1:6, times = 4),
math = c(
-3.068, -1.13, -0.921, 0.463, 0.021, 2.035,
-2.732, -2.097, -0.988, 0.227, 0.403, 1.623,
-2.732, -1.898, -0.921, 0.587, 1.578, 2.3,
-2.288, -2.162, -1.631, -1.555, -0.725, 0.097
)
)
performance::check_group_variation(egsingle, by = c("schoolid", "childid"))
#> Check schoolid variation
#>
#> variable | type
#> ---------------------------
#> lowinc | between (nested)
#> female | both
#> year | within
#> math | both
#>
#> Check childid variation
#>
#> variable | type
#> ------------------
#> lowinc | between
#> female | between
#> year | within
#> math | both Created on 2025-05-08 with reprex v2.1.1 |
Not sure what you mean here - nested is defined differently than between (between is fixed). |
Oh, I see. This is a matter of perspective - nested variables also vary within each group (they are not fixed) but they also vary between groups (levels are not crossed), so maybe it is something between "between" and "both", which was why I chose to give it a separate label. |
But we can have "nested both" and "nested between", that's why I though this information is useful. See your example: data.frame(group, variable1, variable2, variable3) |>
performance::check_group_variation(by = "group")
#> Check group variation
#>
#> variable | type
#> -------------------
#> variable1 | between
#> variable2 | within
#> variable3 | both
c(
variable1 = lme4::isNested(variable1, group),
variable2 = lme4::isNested(variable2, group),
variable3 = lme4::isNested(variable3, group)
)
#> variable1 variable2 variable3
#> TRUE FALSE TRUE |
Do you have examples for the use of |
@strengejacke I think this might still need some work. You can make two types of decisions:
But also:
So we have the following combinations (with "--" marking impossible situations):
|
Currently in the code, you check if the variable is crossed (and possible balanced) and if it also nested - but this is an impossible situation. I'm reverting you change, sorry. |
Fixes #810