-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extension to calc_rdm to ignore within-run similarities completely from any chosen distance measure #327
Comments
Hi @andreifoldes, This does seem like a reasonable idea, that we might want to implement soon. Due to the normalisation implied in the correlation calculation, a scheme that first calculates correlations and then averages the right run combinations will yield different results than averaging first and calculating the correlation later, but this would be fine, I think. |
Just to make sure I understand correctly: does cross validation in the context of calc_rdm mean "not using within-run distances" or is there additional/different steps involved? Assuming that to be the case having additional xxx_crossval for the already existing measures would indeed be a cool feature! To your second point, yes I haven't thought about that .. I don't have a strong intuition about that I lean towards the former in my own work (+ permutation testing). |
Hi @andreifoldes, One of the problems with using cross-validated estimates for correlations (or correlation distances) is that the estimates can become invalid or at least very unstable (high variance). This happens when signal-to-noise is low and the crossvalidated variance estimate approaches zero. Explicitly: r_a,b = cov_a,b / sqrt (var_a * var_b) The covariance estimate can be easily replace by the (unbiased) cross-block estimate, but if you do it to the variance estimate you can get very large correlation estimates >>1, or imaginary numbers. The common problem is described in more detail here: So this is the main reason this was not implemented so far. Technically it is not difficult - and you can use some regularization to make these estimates behave better- but the subsequent issues for inference are currently not addressed. best, Joern |
Dear Joern,
|
Hi Andrei, Yes, you can define a "correlation" measure by using the cv (or more precisely "cross-block") estimate of the covariance and the naive estimate of the variances. In the @rsagroup, we refer to this (if I remember correctly) as the Type II estimate (as opposed to the Type I estimate where both variances and co-variances are cross-block). Type II (and even naive if you assume within-run covariances are unbiased), are totally fine to test the hypothesis of r>0 - for that you can get away with most measures, I think (even classification accuracy will do :-)). Same for the difference between two pairs of conditions within the same region (assuming the measurement error can be assumed to be the same across conditions). Type II correlations would be a good cautionary step to avoid any within-run dependencies in the measurement process of the different conditions. For 3: Happy to hear your thoughts on what is useful. I will do a method poster at OHBM on correlation estimation and inference, so it would be good to check what you think is useful here - we could provide Type I estimates with the appropriate warning label. Usually we want something published on the issue before we put it in the toolbox, so hopefully over the summer I will find time to make the blog into a proper paper. Joern |
Awesome! Looking forward to the blogpost + poster! |
Is your feature request related to a problem? Please describe.
I'm planning on using "correlation" distance for my neuroimaging analysis and it is my understanding that it is good practice to remove the within-run correlations (I know about crossnobis). I find it complicated to remove within-run similarities from the analysis.
Describe the solution you'd like
I think it would be useful for maybe calc_rdm to have an optional argument whereby if runs have been provided for the dataset that within-run similarities be ignored.
Describe alternatives you've considered
I haven't yet figured out a way to do a "ignore within-run similarities" in rsatoolbox :( perhaps the rdm could be subset after calc_rdm runs? In which case would be nice to a separate function that would remove within-run similarities...
The text was updated successfully, but these errors were encountered: