Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make noisy channel exclusion during Reference compatible with MATLAB PREP #93

Merged
merged 16 commits into from
Jun 27, 2021

Conversation

a-hurst
Copy link
Collaborator

@a-hurst a-hurst commented Jun 24, 2021

PR Description

Closes #91. This PR does three main things:

  • Changes Reference.robust_reference to stop permanently excluding channels that are flagged as bad-by-SNR prior to initial re-referencing.
  • Changes Reference.robust_reference to exclude bad-by-dropout channels from the full set of bad channels whenmatlab_strict is True, matching MatPREP's (likely accidental) behaviour.
  • Adds an as_dict argument to NoisyChannels.get_bads that returns bad channels by type as a dictionary in order to facilitate the above changes. In addition to being generally useful, it also saves a lot of dict construction in reference.py.

I still need to update the whats_new.rst and matlab_differences.rst here, but I figured given the time-zone difference that it would be better to get the code up for review before I went to bed. Thanks in advance for checking this over!

Merge Checklist

  • the PR has been reviewed and all comments are resolved
  • all CI checks pass
  • (if applicable): the PR description includes the phrase closes #<issue-number> to automatically close an issue
  • (if applicable): bug fixes, new features, or API changes are documented in whats_new.rst

@codecov-commenter
Copy link

codecov-commenter commented Jun 24, 2021

Codecov Report

Merging #93 (92a10e7) into master (4614ca0) will increase coverage by 0.31%.
The diff coverage is 97.29%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #93      +/-   ##
==========================================
+ Coverage   98.51%   98.83%   +0.31%     
==========================================
  Files           7        7              
  Lines         675      687      +12     
==========================================
+ Hits          665      679      +14     
+ Misses         10        8       -2     
Impacted Files Coverage Δ
pyprep/reference.py 98.31% <96.00%> (+1.76%) ⬆️
pyprep/find_noisy_channels.py 100.00% <100.00%> (ø)
pyprep/prep_pipeline.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4614ca0...92a10e7. Read the comment docs.

@sappelhoff sappelhoff added this to the 0.4.0 milestone Jun 24, 2021
Copy link
Owner

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Only minor comments for docs

pyprep/reference.py Show resolved Hide resolved
pyprep/reference.py Outdated Show resolved Hide resolved
pyprep/reference.py Show resolved Hide resolved
@a-hurst
Copy link
Collaborator Author

a-hurst commented Jun 26, 2021

@sappelhoff Okay, I've updated the docs and whats-new, and have addressed your comments. A couple questions before this gets merged:

  • Any thoughts on whether there should be an API for configuring max_iterations at the PrepPipeline level? There's currently no API for configuring NoisyChannels thresholds from PrepPipeline either, so I wasn't sure what the expectation was for Reference settings. This might be a broader API question that goes beyond the scope of this PR.
  • While writing out the matlab_differences explanation, I realized two weird things about the current logic we might want to change:
    • With the current changes, channels flagged as bad-by-SNR before initial average referencing are no longer permanently flagged as "bad" after initial average referencing, the same as bad-by-deviation/noise/correlation/dropout/RANSAC channels. However, because they're included as part of the self.unusable_channels set, they're still always excluded from the calculated average reference signal! This means that initial bad-by-SNR channels are considered good enough to use when interpolating flagged bad channels, but not good enough to include under any circumstances (even if interpolated) in the average reference. This seems wrong, but is exactly what MATLAB PREP is doing. Any thoughts on whether we want to change this?
    • Currently, bad-by-dropout channels are flagged and removed if detected after initial average referencing (provided matlab_strict is False). However, if the signal isn't average-referenced to begin with, wouldn't the initial average referencing make any intermittent flat regions in the data no longer flat? As such, wouldn't we want bad-by-dropout channels to be flagged as unusable prior to initial referencing, the same as with bad-by-NaN and bad-by-flat channels?

@sappelhoff
Copy link
Owner

Any thoughts on whether there should be an API for configuring max_iterations at the PrepPipeline level? There's currently no API for configuring NoisyChannels thresholds from PrepPipeline either, so I wasn't sure what the expectation was for Reference settings

I think that if a parameter is configurable in MatPREP, and it's reasonably easy to make it configurable in PyPrep, then we SHOULD make it possible. These params should always have a reasonable default (like in MatPrep) though. On the other hand, let's not expose more params than MatPrep does (for now), unless we judge it very very important.

This means that initial bad-by-SNR channels are considered good enough to use when interpolating flagged bad channels, but not good enough to include under any circumstances (even if interpolated) in the average reference

🤔 yeah, that sounds off. +1 for an issue about this 1. on pyprep and 2. on matprep. The one for pyprep can be marked to be resolved in version 0.5 --> version 0.4 being targeted more at feature completeness with matprep. If nobody ever reacts to this on matprep, we can do our own, non-matprep-strict thing in pyprep starting with 0.5.

However, if the signal isn't average-referenced to begin with, wouldn't the initial average referencing make any intermittent flat regions in the data no longer flat? As such, wouldn't we want bad-by-dropout channels to be flagged as unusable prior to initial referencing, the same as with bad-by-NaN and bad-by-flat channels?

Yes, that's true, good point. What does matprep do here?

Copy link
Owner

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are three lines in this diff that are not being covered --> is it possible to reasonably do so with little changes/effort? If not, no problem.

@a-hurst
Copy link
Collaborator Author

a-hurst commented Jun 26, 2021

I think that if a parameter is configurable in MatPREP, and it's reasonably easy to make it configurable in PyPrep, then we SHOULD make it possible.

Okay, I'll add it to the params dict for now!

+1 for an issue about this 1. on pyprep and 2. on matprep. The one for pyprep can be marked to be resolved in version 0.5 --> version 0.4 being targeted more at feature completeness with matprep. If nobody ever reacts to this on matprep, we can do our own, non-matprep-strict thing in pyprep starting with 0.5.

Sounds good, will do! Maybe this is a place for some experimentation, but my gut instinct would be to default to only marking initial bad-by-flat and bad-by-NaN channels as fully "unusable". I'd imagine that the original intent was to removing channels that contribute more noise than signal from contaminating the average reference, but given how sensitive the 'bad-by-correlation' default settings are (if over 1% of all correlation windows are below threshold, the whole channel is considered bad) and how that can change after average referencing, I don't think initial bad-by-SNR channels should be treated as permanently bad. Anyway, that can all be discussed in the new issue.

Yes, that's true, good point. What does matprep do here?

It just ignores dropout channels completely during referencing: only bad-by-flat/NaN/SNR channels are permanently flagged and removed from the average reference at the onset, and due to the likely bug we've reported already they won't get identified or excluded on subsequent iterations either.

there are three lines in this diff that are not being covered --> is it possible to reasonably do so with little changes/effort? If not, no problem.

The matlab_strict line will get covered in the EEGLAB interpolation PR when I can add the MatPREP comparison tests, but I'll try adding a test case to cover the other main ones!

@a-hurst
Copy link
Collaborator Author

a-hurst commented Jun 26, 2021

Okay, I've made codecov a bit happier now: I covered a line that only gets run if the input signal is completely clean, fixed a bug that prevented a ValueError from getting covered (it was throwing a ValueError earlier, so the test passed but the coverage failed), and worked around a known coverage.py bug where a break immediately after an if statement doesn't get counted as being run.

@sappelhoff
Copy link
Owner

Thanks @a-hurst

@sappelhoff sappelhoff merged commit 3625b26 into sappelhoff:master Jun 27, 2021
@a-hurst a-hurst mentioned this pull request Jun 27, 2021
23 tasks
sappelhoff pushed a commit that referenced this pull request Jul 7, 2021
…PREP (#93)

* Add "as_dict" option for get_bads

* Don't permanently exclude initial bad-by-SNRs

* Match PREP's noisy channel updating logic

* Minor variable name cleanup

* use as_dict throughout reference.py

* Fix 'ignore' logic

* Add back "bad_all" (whoops)

* Update matlab_differences.rst

* Add max_iterations args, link to dropout issue

* Fix diffs mistake, update whats_new.rst

* Add PrepPipeline API for max_iterations

* Improve test coverage for Reference

* Add whats_new entry for SNR changes

* Fix quotes to make black happy

* Improve Reference test coverage some more

* remove unused import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PyPREP interpolates more channels than MATLAB PREP during re-referencing
3 participants