Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extension to calc_rdm to ignore within-run similarities completely from any chosen distance measure #327

Open
andreifoldes opened this issue May 25, 2023 · 6 comments

Comments

@andreifoldes
Copy link

andreifoldes commented May 25, 2023

Is your feature request related to a problem? Please describe.
I'm planning on using "correlation" distance for my neuroimaging analysis and it is my understanding that it is good practice to remove the within-run correlations (I know about crossnobis). I find it complicated to remove within-run similarities from the analysis.

Describe the solution you'd like
I think it would be useful for maybe calc_rdm to have an optional argument whereby if runs have been provided for the dataset that within-run similarities be ignored.

Describe alternatives you've considered
I haven't yet figured out a way to do a "ignore within-run similarities" in rsatoolbox :( perhaps the rdm could be subset after calc_rdm runs? In which case would be nice to a separate function that would remove within-run similarities...

@andreifoldes andreifoldes changed the title extension to calc_rdm remove within-run similarity completely from any distance measure extension to calc_rdm remove within-run similarities completely from any distance measure May 25, 2023
@andreifoldes andreifoldes changed the title extension to calc_rdm remove within-run similarities completely from any distance measure extension to calc_rdm to ignore within-run similarities completely from any chosen distance measure May 25, 2023
@HeikoSchuett
Copy link
Contributor

Hi @andreifoldes,
for the cross-validated distances we have so far, we implemented them as a separate distance measure, because they mostly run different code. It would thus be easier to implement a "corr_crossval" option for calc_rdm.

This does seem like a reasonable idea, that we might want to implement soon. Due to the normalisation implied in the correlation calculation, a scheme that first calculates correlations and then averages the right run combinations will yield different results than averaging first and calculating the correlation later, but this would be fine, I think.

@andreifoldes
Copy link
Author

andreifoldes commented May 30, 2023

Just to make sure I understand correctly: does cross validation in the context of calc_rdm mean "not using within-run distances" or is there additional/different steps involved?

Assuming that to be the case having additional xxx_crossval for the already existing measures would indeed be a cool feature!

To your second point, yes I haven't thought about that .. I don't have a strong intuition about that I lean towards the former in my own work (+ permutation testing).

@jdiedrichsen
Copy link
Contributor

Hi @andreifoldes,

One of the problems with using cross-validated estimates for correlations (or correlation distances) is that the estimates can become invalid or at least very unstable (high variance). This happens when signal-to-noise is low and the crossvalidated variance estimate approaches zero.

Explicitly: r_a,b = cov_a,b / sqrt (var_a * var_b)

The covariance estimate can be easily replace by the (unbiased) cross-block estimate, but if you do it to the variance estimate you can get very large correlation estimates >>1, or imaginary numbers.

The common problem is described in more detail here:
https://www.diedrichsenlab.org/BrainDataScience/noisy_correlation/index.htm

So this is the main reason this was not implemented so far. Technically it is not difficult - and you can use some regularization to make these estimates behave better- but the subsequent issues for inference are currently not addressed.

best,

Joern

@andreifoldes
Copy link
Author

Dear Joern,

  1. Thank you, that is a very informative read! I'm currently in the "want to test the hypothesis that the true correlation is larger than zero" camp (or the more specifically the "test that the difference in correlations between two conditions is larger than zero"). Do I understand correctly that in that case this is less of a problem?

  2. I probably haven't given this much thought but what happens if one replaces the covariance estimate with cv, but leaves the variance estimate alone - I'm guessing that's not correlation anymore, but is it better than just using correlation?

  3. I think - for better or worse - correlation will stay around as a measure (at least I can say that for memory-RSA it still seems widespread and the question would be whether with all the caveats you write about is it still not better than running simple correlation? It appears to me that that both solid and dashed lines are closer to the true value than the green for any SNR.

@jdiedrichsen
Copy link
Contributor

jdiedrichsen commented Jun 1, 2023

Hi Andrei,

Yes, you can define a "correlation" measure by using the cv (or more precisely "cross-block") estimate of the covariance and the naive estimate of the variances. In the @rsagroup, we refer to this (if I remember correctly) as the Type II estimate (as opposed to the Type I estimate where both variances and co-variances are cross-block).

Type II (and even naive if you assume within-run covariances are unbiased), are totally fine to test the hypothesis of r>0 - for that you can get away with most measures, I think (even classification accuracy will do :-)). Same for the difference between two pairs of conditions within the same region (assuming the measurement error can be assumed to be the same across conditions). Type II correlations would be a good cautionary step to avoid any within-run dependencies in the measurement process of the different conditions.

For 3: Happy to hear your thoughts on what is useful. I will do a method poster at OHBM on correlation estimation and inference, so it would be good to check what you think is useful here - we could provide Type I estimates with the appropriate warning label. Usually we want something published on the issue before we put it in the toolbox, so hopefully over the summer I will find time to make the blog into a proper paper.

Joern

@andreifoldes
Copy link
Author

Awesome! Looking forward to the blogpost + poster!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants