Ransac Comparison Example #53

yjmantilla · 2021-02-14T03:52:14Z

PR Description

Basically a PR for the ransac comparison between autoreject's and pyprep's implementation.
This is a first version so is a WIP. All ideas welcome for the comparison.

Currently for the real EEG comparison they differ in two channels:

AR: ['C3', 'F7', 'F8', 'FT8', 'T9', 'P7']
PYPREP: ['C3', 'C1', 'F8', 'FT8', 'P7', 'P4']

The reason why is not entirely clear to me. We should maybe investigate further how autoreject's ransac works and whether the input parameters I give him here really do make sense for the comparison.

I did notice autoreject use an specific way to generate the channels subsets (adds a bias to the seed --> https://github.com/autoreject/autoreject/blob/dd0942438440070876e8ea4796b83ffdd4981600/autoreject/ransac.py#L198 )

We have the correlation matrices of both methods available here for comparison but I have not really used them yet.

Also Autoreject's Ransac is faster at least for these examples. Im not sure yet how is the memory consumption in autoreject.

The doc's building is failing because autoreject is not installed in whatever executes the example. I don't know where that is configured.

In the final version of this PR it should close #36

Merge Checklist

the PR has been reviewed and all comments are resolved
all CI checks pass
(if applicable): the PR description includes the phrase closes #<issue-number> to automatically close an issue
(if applicable): bug fixes, new features, or API changes are documented in whats_new.rst

needed for example building

sappelhoff

looks like a good start to me :)

I recently found out that in sphinx gallery you can replace the "bar" of # with # %% signs. I like that second option better, because then editors like VS Code can pick up on that and let you run the example similar to a notebook in the editor.

see here for more info: https://sphinx-gallery.github.io/stable/syntax.html#embed-rst-in-your-example-python-files

regarding the autoreject import, I added it to requirements-dev.txt --> that's what contains all dependencies that are needed for testing, building the docs, and development in general.

The dependencies for pyprep are in setup.cfg

examples/ransac_compare.py

sappelhoff · 2021-02-14T11:30:29Z

link to example as built by the CI service: https://pyprep--53.org.readthedocs.build/en/53/auto_examples/ransac_compare.html#sphx-glr-auto-examples-ransac-compare-py

examples/ransac_compare.py

vscode style blocks Co-authored-by: Stefan Appelhoff <[email protected]>

yjmantilla · 2021-02-24T04:11:28Z

@sappelhoff

So been delving a bit into the workings of both ransac's.

pyprep calculates 1 epoch less than AR, i fixed that, was just a limit in a np.arangue
AR selects particular subset of channels for each (epoch,sample,channel), pyprep just does for (sample,channel)
AR will iterate through epochs finding the correlations for all channels in each iteration
Pyprep will iterate through channels finding the correlations for all epochs in each iteration

As of now depending on the data pyprep and AR will either consume more memory (I think...,my speculation):

If there are more channels than epochs --> AR consumes more
If there are more epochs than channels --> Pyprep consumes more

Memory wise :

AR will use a (n_points_1epoch,n_channels,n_samples) array
pyprep will use (n_points_all_epochs,1 channel,n_samples) array

If we wanted to optimize memory consumption it would be something of the sort of iterating through both epochs and channels. Or intelligently select the iteration through the greatest between n_epochs and n_channels
In that sense it may be easier to refactor AR to include the channel iteration because they already made the matlab validation.

Pyprep Ransac results wont exactly match AR until we manage to work also epoch wise for the prediction, given that the subsets of channels change epoch-wise in Autoreject. (edit: not only epoch-wise but also sample-wise which the original MATLAB RANSAC also did --> #41 (comment))

sappelhoff · 2021-02-24T15:32:25Z

great work! Thanks, that makes a lot of sense!

pyprep calculates 1 epoch less than AR, i fixed that, was just a limit in a np.arangue

did you push that commit yet? I can't see it in the changes

As of now depending on the data pyprep and AR will either consume more memory (I think...,my speculation):

your speculation sounds reasonable 👍

If we wanted to optimize memory consumption it would be something of the sort of iterating through both epochs and channels. Or intelligently select the iteration through the greatest between n_epochs and n_channels

That might be a bit too much optimization, wdyt? the autoreject way already uses quite little memory compared to pyprep's current one.

In that sense it may be easier to refactor AR to include the channel iteration because they already made the matlab validation.

mh yes, it would be totally fine to have one working RANSAC in python, we don't need different implementations in different packages. But the autoreject RANSAC can only be fit on epochs, it would be nice to be able to fit it on continuous data (even if then under the hood, epochs are being calculated).

codecov-io · 2021-02-25T03:07:34Z

Codecov Report

Merging #53 (0314f94) into master (8476ea5) will increase coverage by 0.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master      #53      +/-   ##
==========================================
+ Coverage   96.80%   96.84%   +0.03%     
==========================================
  Files           6        7       +1     
  Lines         532      538       +6     
==========================================
+ Hits          515      521       +6     
  Misses         17       17

Impacted Files	Coverage Δ
pyprep/__init__.py	`100.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8476ea5...57043ac. Read the comment docs.

yjmantilla · 2021-02-25T03:26:28Z

did you push that commit yet? I can't see it in the changes

Just did, had to clean up a bit the files since I had experiments in the branch.

Thinking about it now, I wrote it wrongly. It is not giving "1 less epoch compared to AR", it is giving 1 less epoch compared to mne.make_fixed_length_epochs and I think that just happens when corr_frames divides the data evenly which was the case in the simulated example.

For reference: https://github.com/mne-tools/mne-python/blob/5909e1bb0044e558536e62d441f0d694be65377a/mne/event.py#L906

That might be a bit too much optimization, wdyt? the autoreject way already uses quite little memory compared to pyprep's current one.

I still think it is worthwhile to do. I mean being either of:

AR (n_points_1epoch,n_channels,n_samples)
PYPREP (n_points_all_epochs,1 channel,n_samples)

Both will consume a lot of ram and for some users it will matter. Guess one could say is not an urgent feature but it is important nonetheless.

mh yes, it would be totally fine to have one working RANSAC in python, we don't need different implementations in different packages. But the autoreject RANSAC can only be fit on epochs, it would be nice to be able to fit it on continuous data (even if then under the hood, epochs are being calculated).

Yes that is fixed under the hood; was solved by #36 (comment) , mainly:

        ############AUTO REJECT####################
        if mode == 'autoreject':
            ransac = Ransac(picks=good_idx,n_resample=n_samples, min_channels=fraction_good, min_corr=corr_thresh, unbroken_time=fraction_bad, n_jobs=njobs, random_state=435656, verbose= 'tqdm')
            info = mne.create_info(self.ch_names_new.tolist(), self.sample_rate, ch_types='eeg', verbose=None)
            # Option 1: Single Epoch
            #EEGData_epoch = self.EEGData[np.newaxis, :, :]
            #epochs = mne.EpochsArray(EEGData_epoch,info)
            
            # Option 2: Epochs of corr_window_secs seconds as windows
            raw = mne.io.RawArray(self.EEGData,info)
            epochs = mne.make_fixed_length_epochs(raw, duration=corr_window_secs, preload=True,reject_by_annotation=False, verbose=None)

            # Either way...
            epochs.set_montage(self.montage)
            epochs_clean = ransac.fit(epochs)
            #print(epochs_clean.bad_chs_)
            self.bad_by_ransac = [i for i in epochs_clean.bad_chs_]
            return None
        ############AUTO REJECT####################

Disclaimer: Not sure what behaviour it has if corr_window_secs doesnt evenly divide the data, thats worth checking. My guess is that it cuts off the tail.

Im not sure of how Autoreject does the memory checking but I dont think they do one. So for example it may be nice to introduce a verify free ram for them too.

Anyway, what approach do we take? Kill pyprep ransac (so long cowboy...) or keep both and just give the option to run either by a parameter? Do direct pull request to AR if we need changes?

I restate here the advantage of AR being already matlab-validated, that is, we could even offer a parameter for it to wrongly calculate the median if the user is a matlab believer.

sappelhoff · 2021-02-25T08:48:22Z

Both will consume a lot of ram and for some users it will matter. Guess one could say is not an urgent feature but it is important nonetheless.

I think I'd be convinced when I see the first reasonable memory issue for AR ransac :) ... so far I haven't seen any.

Anyway, what approach do we take? Kill pyprep ransac (so long cowboy...) or keep both and just give the option to run either by a parameter? Do direct pull request to AR if we need changes?

it would probably be more sustainable to kill pyprep ransac, and stay with the wrapped AR ransac within pyprep. The wrapping then allows us to build some functionality on top of AR ransac that would be out of scope in the AR package (such as support for continuous, as opposed to epochs). -- of course, core "enhancements" could then be done as PRs to the AR package.

I restate here the advantage of AR being already matlab-validated, that is, we could even offer a parameter for it to wrongly calculate the median if the user is a matlab believer.

yes, but is there an open comparison of AR-ransac and PREP-ransac? Or is this all buried in Mainak's PhD work? :-)

yjmantilla · 2021-02-26T04:12:21Z

I think I'd be convinced when I see the first reasonable memory issue for AR ransac :) ... so far I haven't seen any.

Lets see how it goes 😬

yes, but is there an open comparison of AR-ransac and PREP-ransac? Or is this all buried in Mainak's PhD work? :-)

Yeah, i guess thats buried somewhere, although Mainak has left hints along the comments he has made in pyprep's issues.

So what i take from this is that this PR will mutate to:

1 - Implement AR ransac inside pyprep
2 - Example will now be between MATLAB RANSAC and AR RANSAC rather than pyprep's RANSAC.

Is this correct?

sappelhoff · 2021-02-26T08:53:18Z

Implement AR ransac inside pyprep

yes (=adding autoreject as a dependency), and let's keep pyprep ransac around for some more time --- once we are sure that the AR ransac works fine for our needs, we can remove pyprep ransac.

Example will now be between MATLAB RANSAC and AR RANSAC rather than pyprep's RANSAC.

yes, either that, or comparing all three --> but in the long term then rather only AR ransac 👍

codecov-commenter · 2021-06-23T09:30:32Z

Codecov Report

Merging #53 (1d7e03f) into master (4614ca0) will not change coverage.
The diff coverage is n/a.

❗ Current head 1d7e03f differs from pull request most recent head 3cbfcf0. Consider uploading reports for the commit 3cbfcf0 to get more accurate results

@@           Coverage Diff           @@
##           master      #53   +/-   ##
=======================================
  Coverage   98.51%   98.51%           
=======================================
  Files           7        7           
  Lines         675      675           
=======================================
  Hits          665      665           
  Misses         10       10

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4614ca0...3cbfcf0. Read the comment docs.

sappelhoff · 2021-06-23T10:59:42Z

~~tests are failing here because I wanted to pass a np.random.RandomState object to autoreject's RANSAC. Opened an issue about that here: autoreject/autoreject#218~~ has been merged and fixed :)

On a different note: This PR is now very old and A LOT of stuff has happened in the meantime, so my opinion from #53 (comment) is no longer up to date.

Back then I thought it'd be good to start depending on autoreject's RANSAC, modify it to our needs, and part by part remove pyprep's RANSAC.

Now I think we should keep our ransac with all of @a-hurst's fixes.

Still I think it would be nice to have this example of comparing pyprep's and autoreject's RANSAC. So I'd say, let's review / improve this a bit and then merge it. What do you think @a-hurst @yjmantilla ?

One last important note: I merged master into this branch in debef8c and that "removed" a fix that @yjmantilla added and described in #53 (comment)

should we add that back? Could you have a look how that fits in with your recent changes @a-hurst ?

a-hurst · 2021-06-23T14:42:07Z

Wow, looking at the commit that syncs this up with the current version you can really see the extent of the work that's been done on pyprep in the past few months. Nice!

So I'd say, let's review / improve this a bit and then merge it. What do you think @a-hurst @yjmantilla?

Yep, this definitely seems useful! I still need to read this PR over properly so it may have this already, but it would be great to have a section in the example that compares/contrasts the usage of PyPREP vs AutoReject (i.e., types of noise detection, full file vs per-epoch) so that understanding when/where to use each is clearer to newcomers (I remember getting confused by this myself last year when first setting up my EEG pipeline).

I can contribute a bit to this since I've stared into the depths of PyPREP long enough to roughly know how it works/behaves and I don't mind writing docs, I'd just need to get a handle on how AutoReject works enough to compare/contrast. I also wonder about whether AutoReject and PyPREP might be usable together once PyPREP's final interpolation of bad channels is made optional, i.e. using PyPREP for adaptive line noise removal and robust re-referencing, but using AutoReject for final bad channel detection/interpolation post-reference to avoid unnecessary interpolation of channels that are only bad during a certain period of time.

One last important note: I merged master into this branch in debef8c and that "removed" a fix that @yjmantilla added and described in #53 (comment)

should we add that back? Could you have a look how that fits in with your recent changes @a-hurst ?

Are you saying that this PR also had an implementation of window-wise chunking for RANSAC, that got reverted with my current implementation? I looked through the commit you linked but couldn't find it: ransac.py looks functionally identical to how it did before I started my RANSAC rewriting/refactoring. Can you point me to the lines in question?

This reverts commit baac068.

sappelhoff · 2021-06-23T15:00:47Z

I meant specifically this commit: 57043ac

note the correlation_frames + 1

which we don't have in master

pyprep/pyprep/ransac.py

Lines 168 to 175 in 4614ca0

    
           # Calculate the size (in frames) and count of correlation windows 
        
           correlation_frames = corr_window_secs * sample_rate 
        
           signal_frames = data.shape[1] 
        
           correlation_offsets = np.arange( 
        
               0, (signal_frames - correlation_frames), correlation_frames 
        
           ) 
        
           win_size = int(correlation_frames) 
        
           win_count = correlation_offsets.shape[0]

sappelhoff · 2021-06-23T15:37:27Z

I can contribute a bit to this since I've stared into the depths of PyPREP long enough to roughly know how it works/behaves and I don't mind writing docs, I'd just need to get a handle on how AutoReject works enough to compare/contrast.

pretty sure that @yjmantilla would welcome your help on this - as would I :-)

I also wonder about whether AutoReject and PyPREP might be usable together once PyPREP's final interpolation of bad channels is made optional, i.e. using PyPREP for adaptive line noise removal and robust re-referencing, but using AutoReject for final bad channel detection/interpolation post-reference to avoid unnecessary interpolation of channels that are only bad during a certain period of time.

could be a nice future example / documentation (separate from this thread / PR)

sappelhoff · 2021-06-23T15:42:47Z

examples/ransac_comparison_autoreject.py

+print("--- %s seconds ---" % (perf_counter() - start_time))
+
+# Check channels that go bad together by RANSAC
+print("pyprep bad chs:", bad_by_ransac_pyprep)


we should add one more code block below to easily show:

chs bad that are equal between pyprep and AR

chs bad only AR

chs bad only pyprep

a-hurst · 2021-06-23T18:09:43Z

I meant specifically this commit: 57043ac

note the correlation_frames + 1

which we don't have in master

Ahh, I see. Presumably that matches up PyPREP's RANSAC with AutoReject's, but that change would diverge from MATLAB PREP so we'd need to wrap it in matlab_strict if we wanted to add that change.

My understanding was that PREP/PyPREP don't use that last correlation window because it's at the end of the file and would thus be shorter than the rest of the windows. In AR, making use of that last window makes sense since you want all the data per epoch that you can get, but for PyPREP I think it makes sense to ignore it (e.g. the last window might only be 100 ms long, but would be weighted the same as a full 5 second window in determining whether a channel is bad or not). Does that make sense?

sappelhoff · 2021-06-23T18:58:52Z

ah yes, that makes sense. A feature for -non-matlab-strict pyprep may be to include the final window if it is only X% shorter than the others, where X could be set to something like 20% .... and if it's more than 20% shorter, don't calculate anything based on it.

it would be good to add that explanation within the codebase in some way, so that future devs can easily know the reasoning.

yjmantilla · 2021-06-23T21:22:47Z

@sappelhoff @a-hurst

Sorry for the late reply. You guys have been really working hard on pyprep and currently I cant keep up with your fast work hehe.

Now I think we should keep our ransac with all of @a-hurst's fixes.

I agree, after all that was the initial idea.

Still I think it would be nice to have this example of comparing pyprep's and autoreject's RANSAC. So I'd say, let's review / improve this a bit and then merge it. What do you think @a-hurst @yjmantilla ?

I agree, although it wouldnt be high priority imo.

One last important note: I merged master into this branch in debef8c and that "removed" a fix that @yjmantilla added and described in #53 (comment)
Ahh, I see. Presumably that matches up PyPREP's RANSAC with AutoReject's, but that change would diverge from MATLAB PREP so we'd need to wrap it in matlab_strict if we wanted to add that change.

Yes,as @a-hurst said, this was a "matching" I did so that the matrix of correlations had the same shape (and presumably the windows in both implementations of pyprep and ar ransac were the same). That way I could numerically compare the matrix of correlations (or whatever matrix got outside of ransac, I dont quite remember).

My understanding was that PREP/PyPREP don't use that last correlation window because it's at the end of the file and would thus be shorter than the rest of the windows. In AR, making use of that last window makes sense since you want all the data per epoch that you can get, but for PyPREP I think it makes sense to ignore it (e.g. the last window might only be 100 ms long, but would be weighted the same as a full 5 second window in determining whether a channel is bad or not). Does that make sense?

Nice logic. Maybe for the comparison of the correlation matrix between Pyprep's and AR's RANSAC one can just use every window but the last on the AR's Ransac side. I wouldnt know at this moment if the windows align exactly u.u , but I think at the time of this PR they did. After all is a matter of checking the window chunking logic and maybe numerically checking they align if they both have that info in an array like for example the 'correlation offsets' one.

it would be good to add that explanation within the codebase in some way, so that future devs can easily know the reasoning.

I agree.

first version of the comparison example

354e2bc

yjmantilla added the WIP Work In Progress label Feb 14, 2021

yjmantilla marked this pull request as draft February 14, 2021 03:56

yjmantilla requested a review from sappelhoff February 14, 2021 03:57

sappelhoff added 2 commits February 14, 2021 12:15

add autoreject+reqs to requirements-dev

000612a

needed for example building

add tqdm for autoreject

d9f35ef

sappelhoff reviewed Feb 14, 2021

View reviewed changes

examples/ransac_compare.py Outdated Show resolved Hide resolved

yjmantilla and others added 2 commits February 23, 2021 18:13

Apply suggestions from code review

bcbae67

vscode style blocks Co-authored-by: Stefan Appelhoff <[email protected]>

correlation confusion corrected

0314f94

1 epoch less bug corrected

57043ac

formatting fixes

6cbe0fb

yjmantilla mentioned this pull request Mar 19, 2021

Variability in interpolated channels between RANSAC runs? #41

Closed

sappelhoff mentioned this pull request Apr 12, 2021

RANSAC correlations (and logic) differ between PyPREP and MATLAB PREP #65

Closed

Merge branch 'master' into ransac_compare

debef8c

sappelhoff added 3 commits June 23, 2021 11:41

try to adjust with master branch

1d7e03f

do some cleanup and add docs

ccd1e51

install autoreject from master

3cbfcf0

sappelhoff force-pushed the ransac_compare branch from 0041c4d to 3cbfcf0 Compare June 23, 2021 10:35

sappelhoff marked this pull request as ready for review June 23, 2021 10:59

sappelhoff added documentation/example and removed WIP Work In Progress labels Jun 23, 2021

pass into to AR ransac until passing RandomState is possible

baac068

sappelhoff added this to the 0.4.0 milestone Jun 23, 2021

Revert "pass into to AR ransac until passing RandomState is possible"

0600524

This reverts commit baac068.

fix docs link, bump cache version

eff3d26

sappelhoff reviewed Jun 23, 2021

View reviewed changes

sappelhoff modified the milestones: 0.4.0, 0.5.0 Oct 22, 2021

sappelhoff force-pushed the master branch 2 times, most recently from d2c7dc0 to 16c25eb Compare October 22, 2021 07:25

sappelhoff force-pushed the master branch from e900f6c to acf0d4f Compare February 2, 2022 17:44

sappelhoff force-pushed the master branch from e89ab72 to 0ca073e Compare November 20, 2022 12:32

sappelhoff changed the base branch from master to main October 27, 2023 09:55

sappelhoff mentioned this pull request Aug 31, 2024

Make ransac.find_bad_by_ransac private #159

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ransac Comparison Example #53

Ransac Comparison Example #53

yjmantilla commented Feb 14, 2021 •

edited

Loading

sappelhoff left a comment

sappelhoff commented Feb 14, 2021

yjmantilla commented Feb 24, 2021 •

edited

Loading

sappelhoff commented Feb 24, 2021

codecov-io commented Feb 25, 2021

yjmantilla commented Feb 25, 2021 •

edited

Loading

sappelhoff commented Feb 25, 2021

yjmantilla commented Feb 26, 2021 •

edited

Loading

sappelhoff commented Feb 26, 2021

codecov-commenter commented Jun 23, 2021 •

edited

Loading

sappelhoff commented Jun 23, 2021 •

edited

Loading

a-hurst commented Jun 23, 2021

sappelhoff commented Jun 23, 2021

sappelhoff commented Jun 23, 2021

sappelhoff Jun 23, 2021

a-hurst commented Jun 23, 2021

sappelhoff commented Jun 23, 2021

yjmantilla commented Jun 23, 2021

Ransac Comparison Example #53

Are you sure you want to change the base?

Ransac Comparison Example #53

Conversation

yjmantilla commented Feb 14, 2021 • edited Loading

PR Description

Merge Checklist

sappelhoff left a comment

Choose a reason for hiding this comment

sappelhoff commented Feb 14, 2021

yjmantilla commented Feb 24, 2021 • edited Loading

sappelhoff commented Feb 24, 2021

codecov-io commented Feb 25, 2021

Codecov Report

yjmantilla commented Feb 25, 2021 • edited Loading

sappelhoff commented Feb 25, 2021

yjmantilla commented Feb 26, 2021 • edited Loading

sappelhoff commented Feb 26, 2021

codecov-commenter commented Jun 23, 2021 • edited Loading

Codecov Report

sappelhoff commented Jun 23, 2021 • edited Loading

a-hurst commented Jun 23, 2021

sappelhoff commented Jun 23, 2021

sappelhoff commented Jun 23, 2021

sappelhoff Jun 23, 2021

Choose a reason for hiding this comment

a-hurst commented Jun 23, 2021

sappelhoff commented Jun 23, 2021

yjmantilla commented Jun 23, 2021

yjmantilla commented Feb 14, 2021 •

edited

Loading

yjmantilla commented Feb 24, 2021 •

edited

Loading

yjmantilla commented Feb 25, 2021 •

edited

Loading

yjmantilla commented Feb 26, 2021 •

edited

Loading

codecov-commenter commented Jun 23, 2021 •

edited

Loading

sappelhoff commented Jun 23, 2021 •

edited

Loading