-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The input checks seem too strict #10
Comments
Thanks for the suggestions. Here is a point by point response.
Since dealing with missing data is a whole field of reseach on its own, I did not want to include an arbitrary method to do this. Methods as
O2PLS-DA can be understood in two ways (at least). In the first case, the Y matrix consist of few outcomes that are to be predicted with X. Here, specific variation is assumed to be only in X, hence O2PLS-DA is actually OPLS-DA. In the second case, X and Y are both general data matrices, however the inner relation U = TB + H is relaxed to non-square B. I did not consider this case, as there are probably some identifiability issues that arise with non-diagonal B. In CFA, this is possible (I think), but there you have to specify which/how u_i are associated to which t_j.
The number of components cannot be larger than the number of columns, because you can maximally "fill up" a space of dimension number of columns (i.e. the
An additional restriction is that the number of components cannot exceed the number of rows. This has more to do with full rank score matrices, this can be relaxed somehow by using
Note however that the functions without
I don't have direct plans to extend O2PLS. However, what I do find useful is that I'll leave this issue open, perhaps others can contribute to this discussion |
Thank you for the detailed responses. For missing values, I did not mean imputing them - as far as my understanding goes, O(2)PLS is known to tolerate moderate amounts of missing values and this is one of the huge benefits of this method in omics data integration. This is, of course, assuming that the underlying implementation uses NIPALS - and as it is implemented in the package I was hoping to get this benefit. Or did you mean that the way NIPALS ignores NaNs is an arbitrary way of handling NaNs and could be substituted by imputation step?
This is what I was trying to highlight, apologies for vague description. |
I will have to think about it more, I am not quite convinced about these checks yet; I fully understand that the orthogonal and joint components should both be Edit, here is the error I get:
Here is what could be used instead: if (max(ncol(X), ncol(Y)) < n)
stop("n =", n, " exceed # columns in X or Y")
if (ncol(X) < nx || ncol(Y) < ny)
stop("nx = ", nx, " or ny = ", ny, " exceed # columns in X or Y, respectively") |
FirstlyThanks again for the comments. I finally managed to deal with some limitations. See the new blogpost on selbouhaddani.eu and the
For 1, I added a function SecondlyI didn't change the One can do the initial SVD with just I haven't looked at the statistical implications of that. If someone tries that out (as a MSc project or...), I'm happy to support him/her! Maybe we can add support for supervised (O)PLS-DA and prediction where Y is treated as outcome, which addresses the O2PLS limitation... So the code in
|
I've got a couple of cases where I wished to run o2m but could not as the input checks failed: data with NaN is not accepted, it is impossible to perform O2PLS-DA (strict "less than" check of the number of components vs the number of columns in data; granted it is less common thing to do than OPLS-DA); in cross-validation checks the sum of requested components is checked against the number of columns, which of course will work for omics but not for many other datasets etc.
I understand that some limitations may arise from the implementation details (e.g. use of SVD for PCA) but, I wonder if it would be possible to relax some of the checks. Do you plan to support the cases I mentioned above in this package?
Or maybe would it be reasonable to provide a "force" argument to ignore the checks and let the user take the risk of failing miserably (when the algorithm does not indeed support specific case)?
The text was updated successfully, but these errors were encountered: