Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add matlab_strict API for matching MATLAB PREP's correlations and medians during RANSAC #70

Merged
merged 15 commits into from
Apr 26, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 14 additions & 6 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ API Documentation

Here we list the Application Programming Interface (API) for pyprep.

The :class:`NoisyChannels` class
--------------------------------
The :class:`~pyprep.NoisyChannels` class
sappelhoff marked this conversation as resolved.
Show resolved Hide resolved
----------------------------------------

.. automodule:: pyprep
:no-members:
Expand All @@ -22,16 +22,24 @@ The :class:`NoisyChannels` class

NoisyChannels

The :class:`PrepPipeline` class
-------------------------------
The :class:`~pyprep.Reference` class
------------------------------------

.. autosummary::
:toctree: generated/

Reference

The :class:`~pyprep.PrepPipeline` class
---------------------------------------
Comment on lines +25 to +34
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether or not this should be made public (including the import in init.py) --> for NoisyChannels, I can see why people may want to use it separately from PrepPipeline --> but do such usecases also exist for a barebones Reference?

A general rule is to expose as little API as possible ... and only as much as necessary :) That helps tremendously with development (backward incompatible changes) and also helps users to get a clear picture of what can be done and the reasoning behind it (not everybody is a power user).

WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully understood about wanting to keep the public API minimal. My personal use-case for Reference being used separately from PrepPipreline (apart from my MATLAB comparison efforts where I've pre-cleanlined the data) is that I want to be able to plot the results of Cleanline separately from the results of full PREP (cleanline + re-reference) in order to sanity-check the process at each step for each file (see this chunk of my script here).

Apart from that use-case, the other main one I can think of would be if someone wanted to use a different line noise filtering method but still wanted to use PyPREP's robust re-referencing. That said, I guess you could get all the same functionality by restructuring PrepPipeline a bit to allow for separate CleanLine and/or re-referencing from that class instead of making Reference public. If that sounds better to you (or if you'd just prefer to keep the public API as-is), I'll just remove Reference from the init along with my references to it in any public docstrings.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay cool, that sounds like a fair use. +1 to making it part of the public API then, as proposed here. In the future we should then add an example of how to make use of that (like you do in your script).


.. autosummary::
:toctree: generated/

PrepPipeline

The :mod:`ransac` module
===============================
The :mod:`~pyprep.ransac` module
================================

.. automodule:: pyprep.ransac
:no-members:
Expand Down
3 changes: 2 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@
("Examples", "auto_examples/index"),
("API", "api"),
("What's new", "whats_new"),
("Differences from PREP", "matlab_differences"),
("GitHub", "https://github.com/sappelhoff/pyprep", True),
],
}
sappelhoff marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -100,7 +101,7 @@
intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
"mne": ("https://mne.tools/dev", None),
"numpy": ("https://www.numpy.org/devdocs", None),
"numpy": ("https://numpy.org/devdocs", None),
"scipy": ("https://scipy.github.io/devdocs", None),
"matplotlib": ("https://matplotlib.org", None),
}
Expand Down
105 changes: 105 additions & 0 deletions docs/matlab_differences.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
:orphan:

.. _matlab-diffs:
sappelhoff marked this conversation as resolved.
Show resolved Hide resolved

Deliberate Differences from MATLAB PREP
=======================================

Although PyPREP aims to be a faithful reimplementaion of the original MATLAB
version of PREP, there are a few places where PyPREP has deliberately chosen
to use different defaults than the MATLAB PREP.

To override these differerences, you can set the ``matlab_strict`` argument to
:class:`~pyprep.PrepPipeline`, :class:`~pyprep.Reference`, or
:class:`~pyprep.NoisyChannels` as ``True`` to match the original PREP's
internal math.

.. contents:: Table of Contents
:depth: 3


Differences in RANSAC
---------------------

During the "find-bad-by-RANSAC" step of noisy channel detection (see
:func:`~pyprep.ransac.find_bad_by_ransac`), PREP does the follwing steps to
identify channels that aren't well-predicted by the signals of other channels:

1) Generates a bunch of random subsets of currently-good channels from the data
(50 samples by default, each containing 25% of the total EEG channels in the
dataset).

2) Uses the signals and spatial locations of those channels to predict what the
signals will be at the spatial locations of all the other channels, with each
random subset of channels generating a different prediction for each channel
(i.e., 50 predicted signals per channel by default).

3) For each channel, calculates the median predicted signal from the full set of
predictions.

4) Splits the full data into small non-overlapping windows (5 seconds by
default) and calculates the correlation between the median predicted signal
and the actual signal for each channel within each window.

5) Compares the correlations for each channel against a threshold value (0.75
by default), flags all windows that fall below that threshold as 'bad', and
calculates the proportions of 'bad' windows for each channel.

6) Flags all channels with an excessively high proportion of 'bad' windows
(minimum 0.4 by default) as being 'bad-by-RANSAC'.

With that in mind, here are the areas where PyPREP's defaults deliberately
differ from the original PREP implementation:


Calculation of median estimated signal
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In MATLAB PREP, the median signal in step 3 is calculated by sorting the
sappelhoff marked this conversation as resolved.
Show resolved Hide resolved
different predictions for each EEG sample/channel from low to high and then
taking the value at the middle index for each. The relevant lines of MATLAB
PREP's ``findNoisyChannels.m`` are reproduced below:

.. code-block:: matlab

function rX = calculateRansacWindow(XX, P, n, m, p)
YY = sort(reshape(XX*P, n, m, p),3);
YY = YY(:, :, round(end/2));
rX = sum(XX.*YY)./(sqrt(sum(XX.^2)).*sqrt(sum(YY.^2)));

The first line of the function generates the full set of predicted signals for
each RANSAC sample, and then sorts the predicted values for each channel /
timepoint from low to high. The second line calculates the index of the middle
value (``round(end/2)``) and then uses it to take the middle slice of the
sorted array to get the median predicted signal for each channel.

Because this logic only returns the correct result for odd numbers of samples,
the current function will instead return the true median signal across
predictions unless strict MATLAB equivalence is requested.


Correlation of predicted vs. actual signals
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In MATLAB PREP, RANSAC channel predictions are correlated with actual data
sappelhoff marked this conversation as resolved.
Show resolved Hide resolved
in step 4 using a non-standard method: essentialy, it uses the standard Pearson
correlation formula but without subtracting the channel means from each channel
before calculating sums of squares. This is done in the last line of the
``calculateRansacWindow`` function reproduced above:

.. code-block:: matlab

rX = sum(XX.*YY)./(sqrt(sum(XX.^2)).*sqrt(sum(YY.^2)));

For readablility, here's the same formula written in Python code::

SSxx = np.sum(xx ** 2)
SSyy = np.sum(yy ** 2)
rX = np.sum(xx * yy) / (np.sqrt(SSxx) * np.sqrt(SSyy))

Because the EEG data will have already been filtered to remove slow drifts in
baseline before RANSAC, the signals correlated by this method will already be
roughly mean-centered. and will thus produce similar values to normal Pearson
correlation. However, to avoid making any assumptions about the signal for any
given channel / window, PyPREP defaults to normal earson correlation unless
strict MATLAB equivalence is requested.
38 changes: 20 additions & 18 deletions docs/whats_new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ People who contributed to this software across releases (in **alphabetical order

* `Aamna Lawrence`_
* `Adam Li`_
* `Austin Hurst`_
* `Christian Oreilly`_
* `Stefan Appelhoff`_
* `Victor Xiang`_
* `Yorguin Mantilla`_
* `Austin Hurst`_

.. _whats_new:

Expand All @@ -33,12 +33,13 @@ Current

Changelog
~~~~~~~~~
- Created a new module named :mod:`ransac` which contains :func:`find_bad_by_ransac <ransac.find_bad_by_ransac>`, a standalone function mirroring the previous ransac method from the :class:`NoisyChannels` class, by `Yorguin Mantilla`_ (:gh:`51`)
- Added two attributes :attr:`PrepPipeline.noisy_channels_before_interpolation <prep_pipeline.PrepPipeline>` and :attr:`PrepPipeline.noisy_channels_after_interpolation <prep_pipeline.PrepPipeline>` which have the detailed output of each noisy criteria, by `Yorguin Mantilla`_ (:gh:`45`)
- Added two keys to the :attr:`PrepPipeline.noisy_channels_original <prep_pipeline.PrepPipeline>` dictionary: ``bad_by_dropout`` and ``bad_by_SNR``, by `Yorguin Mantilla`_ (:gh:`45`)
- Created a new module named :mod:`pyprep.ransac` which contains :func:`find_bad_by_ransac <ransac.find_bad_by_ransac>`, a standalone function mirroring the previous ransac method from the :class:`NoisyChannels` class, by `Yorguin Mantilla`_ (:gh:`51`)
- Added two attributes :attr:`PrepPipeline.noisy_channels_before_interpolation <pyprep.PrepPipeline>` and :attr:`PrepPipeline.noisy_channels_after_interpolation <pyprep.PrepPipeline>` which have the detailed output of each noisy criteria, by `Yorguin Mantilla`_ (:gh:`45`)
- Added two keys to the :attr:`PrepPipeline.noisy_channels_original <pyprep.PrepPipeline>` dictionary: ``bad_by_dropout`` and ``bad_by_SNR``, by `Yorguin Mantilla`_ (:gh:`45`)
- Changed RANSAC chunking logic to reduce max memory use and prefer equal chunk sizes where possible, by `Austin Hurst`_ (:gh:`44`)
- Changed RANSAC's random channel sampling code to produce the same results as MATLAB PREP for the same random seed, additionally changing the default RANSAC sample size from 25% of all *good* channels (e.g. 15 for a 64-channel dataset with 4 bad channels) to 25% of *all* channels (e.g. 16 for the same dataset), by `Austin Hurst`_ (:gh:`62`)
- Changed RANSAC so that "bad by high-frequency noise" channels are retained when making channel predictions (provided they aren't flagged as bad by any other metric), matching MATLAB PREP behaviour, by `Austin Hurst`_ (:gh:`64`)
- Added a new flag ``matlab_strict`` to :class:`~pyprep.PrepPipeline`, :class:`~pyprep.Reference`, :class:`~pyprep.NoisyChannels`, and :func:`~pyprep.ransac.find_bad_by_ransac` for optionally matching MATLAB PREP's internal math as closely as possible, overriding areas where PyPREP attempts to improve on the original, by `Austin Hurst`_ (:gh:`70`)

Bug
~~~
Expand All @@ -49,9 +50,10 @@ Bug

API
~~~
- The permissible parameters for the following methods were removed and/or reordered: :func:`ransac.ransac_correlations`, :func:`ransac.run_ransac`, and :func:`ransac.get_ransac_pred` methods, by `Yorguin Mantilla`_ (:gh:`51`)
- The following methods have been moved to a new module named :mod:`ransac` and are now private: :meth:`NoisyChannels.ransac_correlations`, :meth:`NoisyChannels.run_ransac <find_noisy_channels.NoisyChannels.run_ransac>`, and :meth:`NoisyChannels.get_ransac_pred <find_noisy_channels.NoisyChannels.get_ransac_pred>` methods, by `Yorguin Mantilla`_ (:gh:`51`)
- The permissible parameters for the following methods were removed and/or reordered: :meth:`NoisyChannels.ransac_correlations <find_noisy_channels.NoisyChannels.ransac_correlations>`, :meth:`NoisyChannels.run_ransac`, and :meth:`NoisyChannels.get_ransac_pred <find_noisy_channels.NoisyChannels.get_ransac_pred>` methods, by `Austin Hurst`_ and `Yorguin Mantilla`_ (:gh:`43`)
- The permissible parameters for the following methods were removed and/or reordered: `ransac._ransac_correlations`, `ransac._run_ransac`, and `ransac._get_ransac_pred` methods, by `Yorguin Mantilla`_ (:gh:`51`)
- The following methods have been moved to a new module named :mod:`~pyprep.ransac` and are now private: `NoisyChannels.ransac_correlations`, `NoisyChannels.run_ransac`, and `NoisyChannels.get_ransac_pred` methods, by `Yorguin Mantilla`_ (:gh:`51`)
- The permissible parameters for the following methods were removed and/or reordered: `NoisyChannels.ransac_correlations`, `NoisyChannels.run_ransac`, and `NoisyChannels.get_ransac_pred` methods, by `Austin Hurst`_ and `Yorguin Mantilla`_ (:gh:`43`)


.. _changes_0_3_1:

Expand All @@ -60,22 +62,22 @@ Version 0.3.1

Changelog
~~~~~~~~~
- It's now possible to pass keyword arguments to the notch filter inside :class:`PrepPipeline <prep_pipeline.PrepPipeline>`; see the ``filter_kwargs`` parameter by `Yorguin Mantilla`_ (:gh:`40`)
- It's now possible to pass keyword arguments to the notch filter inside :class:`PrepPipeline <pyprep.PrepPipeline>`; see the ``filter_kwargs`` parameter by `Yorguin Mantilla`_ (:gh:`40`)
- The default filter length for the spectrum_fit method will be '10s' to fix memory issues, by `Yorguin Mantilla`_ (:gh:`40`)
- Channel types are now available from a new ``ch_types_all`` attribute, and non-EEG channel names are now available from a new ``ch_names_non_eeg`` attribute from :class:`PrepPipeline <prep_pipeline.PrepPipeline>`, by `Yorguin Mantilla`_ (:gh:`34`)
- Renaming of ``ch_names`` attribute of :class:`PrepPipeline <prep_pipeline.PrepPipeline>` to ``ch_names_all``, by `Yorguin Mantilla`_ (:gh:`34`)
- It's now possible to pass ``'eeg'`` to ``ref_chs`` and ``reref_chs`` keywords to the ``prep_params`` parameter of :class:`PrepPipeline <prep_pipeline.PrepPipeline>` to select only eeg channels for referencing, by `Yorguin Mantilla`_ (:gh:`34`)
- :class:`PrepPipeline <prep_pipeline.PrepPipeline>` will retain the non eeg channels through the ``raw`` attribute. The eeg-only and non-eeg parts will be in raw_eeg and raw_non_eeg respectively. See the ``raw`` attribute, by `Christian Oreilly`_ (:gh:`34`)
- Channel types are now available from a new ``ch_types_all`` attribute, and non-EEG channel names are now available from a new ``ch_names_non_eeg`` attribute from :class:`PrepPipeline <pyprep.PrepPipeline>`, by `Yorguin Mantilla`_ (:gh:`34`)
- Renaming of ``ch_names`` attribute of :class:`PrepPipeline <pyprep.PrepPipeline>` to ``ch_names_all``, by `Yorguin Mantilla`_ (:gh:`34`)
- It's now possible to pass ``'eeg'`` to ``ref_chs`` and ``reref_chs`` keywords to the ``prep_params`` parameter of :class:`PrepPipeline <pyprep.PrepPipeline>` to select only eeg channels for referencing, by `Yorguin Mantilla`_ (:gh:`34`)
- :class:`PrepPipeline <pyprep.PrepPipeline>` will retain the non eeg channels through the ``raw`` attribute. The eeg-only and non-eeg parts will be in raw_eeg and raw_non_eeg respectively. See the ``raw`` attribute, by `Christian Oreilly`_ (:gh:`34`)
- When a ransac call needs more memory than available, pyprep will now automatically switch to a slower but less memory-consuming version of ransac, by `Yorguin Mantilla`_ (:gh:`32`)
- It's now possible to pass an empty list for the ``line_freqs`` param in :class:`PrepPipeline <prep_pipeline.PrepPipeline>` to skip the line noise removal, by `Yorguin Mantilla`_ (:gh:`29`)
- The three main classes :class:`PrepPipeline <prep_pipeline.PrepPipeline>`, :class:`NoisyChannels <find_noisy_channels.NoisyChannels>`, and :class:`Reference <reference.Reference>` now have a ``random_state`` parameter to set a seed that gets passed on to all their internal methods and class calls, by `Stefan Appelhoff`_ (:gh:`31`)
- It's now possible to pass an empty list for the ``line_freqs`` param in :class:`PrepPipeline <pyprep.PrepPipeline>` to skip the line noise removal, by `Yorguin Mantilla`_ (:gh:`29`)
- The three main classes :class:`~pyprep.PrepPipeline`, :class:`~pyprep.NoisyChannels`, and :class:`pyprep.Reference` now have a ``random_state`` parameter to set a seed that gets passed on to all their internal methods and class calls, by `Stefan Appelhoff`_ (:gh:`31`)


Bug
~~~

- Corrected inconsistency of :mod:`reference` module with the matlab version (:gh:`19`), by `Yorguin Mantilla`_ (:gh:`32`)
- Prevented an over detrending in :mod:`reference` module, by `Yorguin Mantilla`_ (:gh:`32`)
- Corrected inconsistency of :class:`~pyprep.Reference` with the matlab version (:gh:`19`), by `Yorguin Mantilla`_ (:gh:`32`)
- Prevented an over detrending in :class:`~pyprep.Reference`, by `Yorguin Mantilla`_ (:gh:`32`)


API
Expand All @@ -91,7 +93,7 @@ Version 0.3.0
Changelog
~~~~~~~~~

- Include a boolean ``do_detrend`` in :meth:`Reference.robust_reference <reference.Reference>` to indicate whether detrend should be done internally or not for the use with :mod:`find_noisy_channels` module, by `Yorguin Mantilla`_ (:gh:`9`)
- Include a boolean ``do_detrend`` in :meth:`~pyprep.Reference.robust_reference` to indicate whether detrend should be done internally or not for the use with :class:`~pyprep.NoisyChannels`, by `Yorguin Mantilla`_ (:gh:`9`)
- Robust average referencing + tests, by `Victor Xiang`_ (:gh:`6`)
- Removing trend in the EEG data by high pass filtering and local linear regression + tests, by `Aamna Lawrence`_ (:gh:`6`)
- Finding noisy channels with comparable output to Matlab + tests-including test for ransac, by `Aamna Lawrence`_ (:gh:`6`)
Expand All @@ -103,7 +105,7 @@ Changelog
Bug
~~~

- Prevent an undoing of the detrending in :mod:`find_noisy_channels` module, by `Yorguin Mantilla`_ (:gh:`9`)
- Prevent an undoing of the detrending in :class:`~pyprep.NoisyChannels`, by `Yorguin Mantilla`_ (:gh:`9`)

API
~~~
Expand Down
2 changes: 1 addition & 1 deletion examples/run_full_prep.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@


In this example we show how to run PREP with ``pyprep``. We also compare
:class:`prep_pipeline.PrepPipeline` with PREP's results in Matlab.
:class:`~pyprep.PrepPipeline` with PREP's results in Matlab.

We use sample EEG data from Physionet EEG Motor Movement/Imagery Dataset:
`https://physionet.org/content/eegmmidb/1.0.0/ <https://physionet.org/content/eegmmidb/1.0.0/>`_
Expand Down
1 change: 1 addition & 0 deletions pyprep/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""initialize pyprep."""
import pyprep.ransac as ransac # noqa: F401
from pyprep.find_noisy_channels import NoisyChannels # noqa: F401
from pyprep.reference import Reference # noqa: F401
from pyprep.prep_pipeline import PrepPipeline # noqa: F401

from ._version import get_versions
Expand Down
18 changes: 12 additions & 6 deletions pyprep/find_noisy_channels.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,25 @@ class NoisyChannels:

"""

def __init__(self, raw, do_detrend=True, random_state=None):
def __init__(self, raw, do_detrend=True, random_state=None, matlab_strict=False):
"""Initialize the class.

Parameters
----------
raw : mne.io.Raw
The MNE raw object.
do_detrend : bool
do_detrend : bool, optional
Whether or not to remove a trend from the data upon initializing the
`NoisyChannels` object. Defaults to True.
`NoisyChannels` object. Defaults to ``True``.
random_state : {int, None, np.random.RandomState}, optional
The random seed at which to initialize the class. If random_state
is an int, it will be used as a seed for RandomState.
If None, the seed will be obtained from the operating system
(see RandomState for details). Default is None.
If ``None``, the seed will be obtained from the operating system
(see RandomState for details). Default is ``None``.
matlab_strict : bool, optional
Whether or not PyPREP should strictly follow MATLAB PREP's internal
math, ignoring any improvements made in PyPREP over the original code
(see :ref:`matlab-diffs` for more details). Defaults to ``False``.

"""
# Make sure that we got an MNE object
Expand All @@ -50,6 +54,7 @@ def __init__(self, raw, do_detrend=True, random_state=None):
self.raw_mne._data = removeTrend(
self.raw_mne.get_data(), sample_rate=self.sample_rate
)
self.matlab_strict = matlab_strict

self.EEGData = self.raw_mne.get_data(picks="eeg")
self.EEGData_beforeFilt = self.EEGData
Expand Down Expand Up @@ -407,7 +412,7 @@ def find_bad_by_ransac(
):
"""Detect channels that are not predicted well by other channels.

This method is a wrapper for the :func:`pyprep.ransac.find_bad_by_ransac`
This method is a wrapper for the :func:`ransac.find_bad_by_ransac`
function.

Here, a ransac approach (see [1]_, and a short discussion in [2]_) is
Expand Down Expand Up @@ -475,6 +480,7 @@ def find_bad_by_ransac(
corr_window_secs,
channel_wise,
self.random_state,
self.matlab_strict,
)
self._extra_info['bad_by_ransac'] = {
'ransac_correlations': ch_correlations,
Expand Down
Loading