extension to calc_rdm to ignore within-run similarities completely from any chosen distance measure #327

andreifoldes · 2023-05-25T17:10:31Z

Is your feature request related to a problem? Please describe.
I'm planning on using "correlation" distance for my neuroimaging analysis and it is my understanding that it is good practice to remove the within-run correlations (I know about crossnobis). I find it complicated to remove within-run similarities from the analysis.

Describe the solution you'd like
I think it would be useful for maybe calc_rdm to have an optional argument whereby if runs have been provided for the dataset that within-run similarities be ignored.

Describe alternatives you've considered
I haven't yet figured out a way to do a "ignore within-run similarities" in rsatoolbox :( perhaps the rdm could be subset after calc_rdm runs? In which case would be nice to a separate function that would remove within-run similarities...

HeikoSchuett · 2023-05-28T10:14:54Z

Hi @andreifoldes,
for the cross-validated distances we have so far, we implemented them as a separate distance measure, because they mostly run different code. It would thus be easier to implement a "corr_crossval" option for calc_rdm.

This does seem like a reasonable idea, that we might want to implement soon. Due to the normalisation implied in the correlation calculation, a scheme that first calculates correlations and then averages the right run combinations will yield different results than averaging first and calculating the correlation later, but this would be fine, I think.

andreifoldes · 2023-05-30T17:38:13Z

Just to make sure I understand correctly: does cross validation in the context of calc_rdm mean "not using within-run distances" or is there additional/different steps involved?

Assuming that to be the case having additional xxx_crossval for the already existing measures would indeed be a cool feature!

To your second point, yes I haven't thought about that .. I don't have a strong intuition about that I lean towards the former in my own work (+ permutation testing).

jdiedrichsen · 2023-05-31T20:20:34Z

Hi @andreifoldes,

One of the problems with using cross-validated estimates for correlations (or correlation distances) is that the estimates can become invalid or at least very unstable (high variance). This happens when signal-to-noise is low and the crossvalidated variance estimate approaches zero.

Explicitly: r_a,b = cov_a,b / sqrt (var_a * var_b)

The covariance estimate can be easily replace by the (unbiased) cross-block estimate, but if you do it to the variance estimate you can get very large correlation estimates >>1, or imaginary numbers.

The common problem is described in more detail here:
https://www.diedrichsenlab.org/BrainDataScience/noisy_correlation/index.htm

So this is the main reason this was not implemented so far. Technically it is not difficult - and you can use some regularization to make these estimates behave better- but the subsequent issues for inference are currently not addressed.

best,

Joern

andreifoldes · 2023-06-01T11:40:11Z

Dear Joern,

Thank you, that is a very informative read! I'm currently in the "want to test the hypothesis that the true correlation is larger than zero" camp (or the more specifically the "test that the difference in correlations between two conditions is larger than zero"). Do I understand correctly that in that case this is less of a problem?
I probably haven't given this much thought but what happens if one replaces the covariance estimate with cv, but leaves the variance estimate alone - I'm guessing that's not correlation anymore, but is it better than just using correlation?
I think - for better or worse - correlation will stay around as a measure (at least I can say that for memory-RSA it still seems widespread and the question would be whether with all the caveats you write about is it still not better than running simple correlation? It appears to me that that both solid and dashed lines are closer to the true value than the green for any SNR.

jdiedrichsen · 2023-06-01T14:03:27Z

Hi Andrei,

Yes, you can define a "correlation" measure by using the cv (or more precisely "cross-block") estimate of the covariance and the naive estimate of the variances. In the @rsagroup, we refer to this (if I remember correctly) as the Type II estimate (as opposed to the Type I estimate where both variances and co-variances are cross-block).

Type II (and even naive if you assume within-run covariances are unbiased), are totally fine to test the hypothesis of r>0 - for that you can get away with most measures, I think (even classification accuracy will do :-)). Same for the difference between two pairs of conditions within the same region (assuming the measurement error can be assumed to be the same across conditions). Type II correlations would be a good cautionary step to avoid any within-run dependencies in the measurement process of the different conditions.

For 3: Happy to hear your thoughts on what is useful. I will do a method poster at OHBM on correlation estimation and inference, so it would be good to check what you think is useful here - we could provide Type I estimates with the appropriate warning label. Usually we want something published on the issue before we put it in the toolbox, so hopefully over the summer I will find time to make the blog into a proper paper.

Joern

andreifoldes · 2023-06-01T14:10:51Z

Awesome! Looking forward to the blogpost + poster!

andreifoldes changed the title ~~extension to calc_rdm remove within-run similarity completely from any distance measure~~ extension to calc_rdm remove within-run similarities completely from any distance measure May 25, 2023

andreifoldes changed the title ~~extension to calc_rdm remove within-run similarities completely from any distance measure~~ extension to calc_rdm to ignore within-run similarities completely from any chosen distance measure May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extension to calc_rdm to ignore within-run similarities completely from any chosen distance measure #327

extension to calc_rdm to ignore within-run similarities completely from any chosen distance measure #327

andreifoldes commented May 25, 2023 •

edited

Loading

HeikoSchuett commented May 28, 2023

andreifoldes commented May 30, 2023 •

edited

Loading

jdiedrichsen commented May 31, 2023

andreifoldes commented Jun 1, 2023

jdiedrichsen commented Jun 1, 2023 •

edited

Loading

andreifoldes commented Jun 1, 2023

extension to calc_rdm to ignore within-run similarities completely from any chosen distance measure #327

extension to calc_rdm to ignore within-run similarities completely from any chosen distance measure #327

Comments

andreifoldes commented May 25, 2023 • edited Loading

HeikoSchuett commented May 28, 2023

andreifoldes commented May 30, 2023 • edited Loading

jdiedrichsen commented May 31, 2023

andreifoldes commented Jun 1, 2023

jdiedrichsen commented Jun 1, 2023 • edited Loading

andreifoldes commented Jun 1, 2023

andreifoldes commented May 25, 2023 •

edited

Loading

andreifoldes commented May 30, 2023 •

edited

Loading

jdiedrichsen commented Jun 1, 2023 •

edited

Loading