allow.empty.cell option produces error with empty cells #407

homfunc · 2024-12-24T10:32:47Z

I have some very skew ordinal data which has some gaps in some of the frequency tables for each of two groups and I received an error when using the allow.empty.cell = TRUE option which is required due to the following error:

Error : lavaan->lav_samplestats_step1():  
   some categories of variable `x3' are empty in group 1; frequencies are 
   [53, 13, 12, 4, 7, 6, 2, 1, 2, 0]
In addition: There were 25 warnings (use warnings() to see them)

Here is a reprex

library(tidyverse)
library(lavaan)

# simulate some data
sim_test_model <- "
  f1 =~ x1 + x2 + x3 + x4
  f2 =~ x5 + x6 + x7 + x8
"

variable_stats <- tribble(
  ~varname, ~mean, ~variance, ~skewness, ~kurtosis,
  "x1", 4.354913, 10.21996, 0.5958355, -0.9449583,
  "x2", 1.821965, 3.787711, 2.819906, 7.535725,
  "x3", 1.618497, 2.287157, 3.110957, 10.47878,
  "x4", 2.872832, 6.212976, 1.435709, 1.169378,
  "x5", 1.304046, 1.105366, 4.837008, 27.79657,
  "x6", 1.914451, 3.013506, 2.436427, 6.188786,
  "x7", 3.600000, 7.596759, 0.9209243, -0.2308455,
  "x8", 2.182659, 5.052245, 2.141474, 3.877923
)

simulated_data <-
  simulateData(
    model = sim_test_model,
    model.type = "cfa",
    sample.nobs = c(100, 40),
    ov.var = variable_stats$variance,
    skewness = variable_stats$skewness,
    kurtosis = variable_stats$kurtosis,
    return.type = "data.frame",
    seed = 99242
  ) %>%
  tibble()

# now adjust the means
simulated_data <- simulated_data %>%
  mutate(across(starts_with("x"),
    .fn = ~ (.x + (variable_stats %>% filter(varname == cur_column()) %>%
                     pull(mean)) - mean(.x))
  ))

# and integerfy the data using thresholds of c(-Inf, 0.5, 1.5, ..., 9.5, Inf)
simulated_data <- simulated_data %>%
  mutate(across(starts_with("x"),
    .fn = ~ cut(.x, breaks = c(-Inf, seq(0.5, 9.5), Inf), labels = FALSE)
  ))

fit_test_wlsmv_all_with_groups_allow_empty_cell_is_true <-
  cfa(
    model = sim_test_model,
    data = simulated_data,
    group = "group",
    estimator = "WLSMV",
    verbose = TRUE,
    ordered = TRUE,
    allow.empty.cell = TRUE,
  )

yields error

Error in A21[pstar.idx, th.idx_i] <- lav_matrix_crossprod(SC.COR[, pstar.idx],  : 
  number of items to replace is not a multiple of replacement length

That particular error is easy to fix...

index 7a2ef81..d6a97c6 100644
--- a/R/lav_muthen1984.R
+++ b/R/lav_muthen1984.R
@@ -331,12 +331,14 @@ muthen1984 <- function(Data = NULL,
         }
 
         # TH
-        A21[pstar.idx, th.idx_i] <-
+        th.idx_i_local <- th.idx_i[seq_len(dim(SC.COR.UNI$dx.th.y1)[2])]
+        th.idx_j_local <- th.idx_j[seq_len(dim(SC.COR.UNI$dx.th.y2)[2])]
+        A21[pstar.idx, th.idx_i_local] <-
           lav_matrix_crossprod(
             SC.COR[, pstar.idx],
             SC.COR.UNI$dx.th.y1
           )
-        A21[pstar.idx, th.idx_j] <-
+        A21[pstar.idx, th.idx_j_local] <-
           lav_matrix_crossprod(
             SC.COR[, pstar.idx],
             SC.COR.UNI$dx.th.y2

but that is not the end of the story as we now get NaN gradients when running

fit_test_wlsmv_group_1_allow_empty_cell_is_true <-
  cfa(
    model = sim_test_model,
    data = simulated_data %>% filter(group == 1),
    estimator = "WLSMV",
    verbose = TRUE,
    ordered = TRUE,
    allow.empty.cell = TRUE,
  )

The text was updated successfully, but these errors were encountered:

ecmerkle · 2024-12-24T22:28:42Z

I recently added this argument for use with blavaan (see #377), because the priors in a Bayesian model can help to handle empty cells (using bcfa() in place of cfa()).

I think an NaN gradient is the correct behavior for lavaan, because some of the threshold parameters cannot be informed by the data. But I can add something to the docs along with a more informative message about the use of allow.empty.cell.

homfunc · 2024-12-28T01:30:39Z

Thanks for the information about blavaan and the reference to the PR.

I have run the example using blavaan and it does survive so that is positive. However, I would have thought (somewhat naiively I guess) that the allow.empty.cell would have simply filled in the empty cells with 0.5 (or whatever is specified in zero.add[2]) and returned something.

In particular, it seems strange to me that I can get a fit back when allow.empty.cell = FALSE for group 1:

fit_test_wlsmv_group_1_allow_empty_cell_is_false <-
  cfa(
    model = sim_test_model,
    data = simulated_data %>% filter(group == 1),
    estimator = "WLSMV",
    verbose = TRUE,
    ordered = TRUE,
    allow.empty.cell = FALSE,
  )

I would like to see some attempt by lavaan to use, in effect, a simple prior to 'fill in the blanks' when requested.

What are your thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow.empty.cell option produces error with empty cells #407

allow.empty.cell option produces error with empty cells #407

homfunc commented Dec 24, 2024 •

edited

Loading

ecmerkle commented Dec 24, 2024

homfunc commented Dec 28, 2024

allow.empty.cell option produces error with empty cells #407

allow.empty.cell option produces error with empty cells #407

Comments

homfunc commented Dec 24, 2024 • edited Loading

ecmerkle commented Dec 24, 2024

homfunc commented Dec 28, 2024

homfunc commented Dec 24, 2024 •

edited

Loading