Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lavPredictY issue with multigroup models (different structures) #369

Open
cicadawing opened this issue Jul 12, 2024 · 2 comments
Open

lavPredictY issue with multigroup models (different structures) #369

cicadawing opened this issue Jul 12, 2024 · 2 comments

Comments

@cicadawing
Copy link

I was attempting to run lavPredictY() on my multigroup analysis, and ended up with some issues with "missing variable names".

I have different regressions for my different groups.

The variable names were not missing within my data - the problem persisted even if I did not supply a dataframe to lavPredictY().

I ran a similar example with the Holzinger data, where x5 only appears in group 1, and ran into a similar issue. Is this a bug - or perhaps, this approach is intractable, and thus not supported?

HS.model <- "
group: Grant-White
x4 ~ x5
visual =~ x1 + x2 + x3 + x7
group: Pasteur
x1 ~ x3
visual =~ x1 + x2 + x3 + x4
"

fit <- cfa(
HS.model,
data = HolzingerSwineford1939,
group = "school"
)

lavaan 0.6.17 ended normally after 56 iterations Estimator ML Optimization method NLMINB Number of model parameters 28 Number of observations per group: Pasteur 156 Grant-White 145 Model Test User Model: Test statistic 51.261 Degrees of freedom 11 P-value (Chi-square) 0.000 Test statistic for each group: Pasteur 50.404 Grant-White 0.857

lavPredictY(
fit,
ynames = lavNames(fit, "ov.y"),
xnames = lavNames(fit, "ov.x")
)

Error in lavPredictY(fit, ynames = lavNames(fit, "ov.y"), xnames = lavNames(fit, : lavaan ERROR: some variable names in xnames do not appear in the dataset: x5

Second attempt (with dataframe)

lavPredictY(
fit,
HolzingerSwineford1939,
ynames = lavNames(fit, "ov.y"),
xnames = lavNames(fit, "ov.x")
)
Error in lavPredictY(fit, HolzingerSwineford1939, ynames = lavNames(fit, : lavaan ERROR: some variable names in xnames do not appear in the dataset: x5

@yrosseel
Copy link
Owner

yrosseel commented Aug 1, 2024

As long as there are no equality (or other) constraints across the groups, I would recommend fitting the model (and do prediction) for each group separately.

I am not sure if we should 'fix' this. The xnames= argument simply expects (at the moment) that the predictor variables are present in all groups. What should we do if this is not the case? Pick the ones that we can find? I find this a bit strange. What is the use case for this? What do you think is the 'right' behavior in this case?

@TDJorgensen
Copy link
Contributor

By explicitly passing the original data to newdata=, the reprex essentially just mimics the default behavior.
I agree the default behavior should not change, which is to generate predicted values for the entire set of original data.

For a specialized model that has different variables in different groups, it should be up to the user to provide newdata= for each group, so that the xnames= and ynames= can be specified per group. However, even that is not possible in the current implementation:

HS1 <- HolzingerSwineford1939[HolzingerSwineford1939$school == "Pasteur", ]
HS2 <- HolzingerSwineford1939[HolzingerSwineford1939$school == "Grant-White", ]

## Both of these yield:
## Error: lavaan->lav_data_full():  
##   model syntax defines multiple groups; data suggests a single group

lavPredictY(fit, newdata = HS1, 
            ynames = lavNames(fit, "ov.y", group = 1), 
            xnames = lavNames(fit, "ov.x", group = 1))

lavPredictY(fit, newdata = HS2, 
            ynames = lavNames(fit, "ov.y", group = 2), 
            xnames = lavNames(fit, "ov.x", group = 2))

I think this is due to the use of lavData() to check the newdata= has properties that match the original data. I don't know if it would be a simple task to update how lavData() works (e.g., to selectively return a @Data slot with 1 or a subset of groups, which could be checked for in the newdata[group] vector), or whether there is a different way to validate newdata= only for for the group(s) in newdataa= (e.g., just checking lavNames(object, group=) for each group for which predictions are requested).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants