This simple package provides code for estimating the directed spectrum, a measure of directed communication between timeseries.
A primary use case of the directed spectrum is the estimation of latent brain networks from recordings of neural activity.
To do so, use the ds function provided in this package to estimate directed spectrum values for your data, then apply a linear factor model, such as non-negative matrix factorization (NMF), to estimate latent networks from the data:
from dir_spec.ds import ds
from sklearn.decomposition import NMF
directed_spectrum = ds(neural_data, sampling_frequency, channel_names)
# format ds array for NMF
directed_spectrum.normalize()
X = directed_spectrum.ds_array
X = X.reshape((X.shape[0], -1))
network_model = NMF(50, solver='mu', beta_loss=0, init='nndsvdar')
network_activations = network_model.fit_transform(X)
latent_networks = network_model.components_The directed spectrum is compatible with the underlying assumptions of linear latent factor models for identifying latent brain networks from neural recordings, unlike other measures of directed communication (eg Granger causality).
These properties of the directed spectrum also extend to other applications where the goal is latent network discovery, as long as the medium through which the signal is recorded does not cause nonlinear distortions relative to the underlying signal generated by the unobserved networks.
The ds function within the ds module takes 2 required parameters, in addition to several optional parameters, and returns an object of the DirectedSpectrum class.
For more information on the directed spectrum please see the associated publication:
Gallagher, N., Dzirasa, K. & Carlson, D. (2021). Directed Spectral Measures Improve Latent Network Models Of Neural Populations. Advances in Neural Information Processing Systems 35.
If you end up using the directed spectrum in your research, please cite that reference.
The supplementary code for the publication listed above is included in this repository as 'DS_supplemental_code.zip'.
This package has two public functions: ds and combine_ds
ds(X, f_samp, groups=None, pairwise=False, f_res=None,
return_onesided=False, estimator='Wilson',
order='multi-aic', max_ord=50, ord_est_epochs=30, n_jobs=None,
max_iter=1000, tol=1e-6, window='hann', nperseg=None, noverlap=None):X : numpy.ndarray, shape (n_windows, n_channels, n_timepoints)
Timeseries data from multiple channels. It is assumed that the
data for each channel are approximately stationary within a window.
If a 2-D array is input, it is assumed that n_windows is 1.
f_samp : float
Sampling rate associated with X.
groups : list of strings, shape (n_channels), optional
To calculate the Directed Spectrum between groups of channels, this
should list the group associated with each channel. Otherwise, to
calculate the directed spectrum between each pair of channels, this
should be a list of channel names. If 'None' (default), then each
channel will be given a unique integer index.
pairwise : bool, optional
If 'True', calculate the pairwise directed spectrum
(i.e. calculate seperately for each pair of groups/channels).
Otherwise, the non-pairwise directed spectrum will be calculated.
Note the pairwise directed spectrum is not defined for elements
where the source and target are the same.
f_res : float, optional
Frequency resolution of the calculated spectrum. For example, if
set to 1, then the directed spectrum will be calculated for
integer frequency values. If set to 'None' (default), then
the frequency resolution will be f_samp/nperseg if estimator is
'Wilson' or f_samp/order if estimator is 'AR' and pairwise is
False. f_res must be set if estimator is 'AR' and pairwise is
True.
return_onesided : bool, optional
If True, return a one-sided spectrum. If False return a
two-sided spectrum. Must be False if the input timeseries is
complex. Defaults to False.
estimator : {'Wilson', 'AR'}, optional
Method to use for estimating the directed spectrum. 'Wilson' is
Wilson's spectral factorization of the data cross-spectral
density matrix. 'AR' fits an autoregressive model to the data.
Defaults to 'Wilson'.
order : int or 'aic', optional
Autoregressive model order. If 'aic', uses Akaike Information
Criterion (AIC) to automatically determine one model order for all
epochs. If 'multi-aic', uses AIC to deterimine different model
orders for each epoch individually. Used only when estimator is
'AR'. Defaults to 'multi-aic'.
max_ord : int, optional
Maximum autoregressive model order. Only used when estimator is
'AR' and order is 'aic'. Default is 50.
ord_est_epochs : int, optional
Number of epochs to sample from full dataset for estimating
model order. Only used when estimator is 'AR' and order is
'aic'. The highest AIC value from all sampled epochs is used
to select the model order. Default is 20.
n_jobs : int, optional
Maximum number of jobs to use for parallel calculation of AIC.
Only used when order is 'aic' or 'multi-aic'. If set to 1,
parallel computing is not used. Default is None, which is
interpreted as 1 unless the call is performed under a
parallel_backend context manager that sets another value for
n_jobs.
max_iter : int, optional
Max number of Wilson factorization iterations. If factorization
does not converge before reaching this value, directed spectrum
estimates may be inaccurate. Used only when estimator is
'Wilson'. No Defaults to 1000.
tol : float, optional
Wilson factorization convergence tolerance value. Used only when
estimator is 'Wilson'. Defaults to 1e-6.
window : str or tuple or array_like, optional
Desired window to use. If window is a string or tuple, it is
passed to get_window to generate the window values, which are
DFT-even by default. See get_window for a list of windows and
required parameters. If window is array_like it will be used
directly as the window and its length must be nperseg. Defaults
to a Hann window.
nperseg : int, optional
Length of each segment. Defaults to None, but if window is str or
tuple, is set to 256, and if window is array_like, is set to the
length of the window.
noverlap: int, optional
Number of points to overlap between segments. If None,
noverlap = nperseg // 2. Defaults to None.
[Documentation for the window, nperseg, and noverlap variables was
copied directly from the scipy.signal.spectral module. These
variables are used for calculating the cross power spectral density
matrix as an intermediate step in calculating the directed spectrum
when estimator is 'Wilson' or to calculate power spectral density
when estimator is 'AR' and pairwise is True, and are not used
otherwise.]
ds returns a DirectedSpectrum object.
DirectedSpectrum(ds_array, f, cgroups, params)ds_array : ndarray, shape (n_windows, n_frequencies, n_groups, n_groups)
Directed spectrum values between each pair of channels/groups
for each frequency and window. Axis 2 corresponds to the source
channel/group and axis 3 corresponds to the target
channel/group. For 'full' models, elements for which the target
and source are the same correspond to the self-directed
spectrum, representing all signal in that region that is not
explained by any of the other sources in the model. For pairwise
directed spectrum, elements for which target and source are the
same are not defined and these entries are populated with the
power spectrum for the associated region instead.
f : ndarray, shape (n_frequencies)
Frequencies associated with axis 1 of ds_array.
groups : ndarray of strings, shape (n_groups)
Names the channels/groups associated with ds_array.
params : dictionary
Contains the parameters that were used to calculate the values in
ds_array.
normalize(norm_type=('channels', 'diagonals', 'frequency'),
fnorm_method='smooth', filter_sd=6.)Normalize values in ds_array for various use cases.
norm_type : one of {'channels', 'diagonals', 'frequency'} or tuple
containing a combination of those strings
default = ('channels', 'diagonals', 'frequency')
if 'channels':
Normalizes values within each target channel separately.
This is typically only appropriate when you expect that
channels have different amplitudes.
if 'diagonals':
Normalizes values with the same source and target channel
separately from those with different source and target. If
you are using pairwise DS, it is suggested to use this
form of normalization, as the 'diagonal' term are
populated with power spectrum values instead of DS and
this normalization is required in order for variances to
be comparable between the two data types.
if 'frequency':
Normalizes each values at each frequency separately. This
is appropriate for something like electrophysiology data,
where you expect higher frequencies to have lower
amplitudes, but want them to be weighted equally.
fnorm_method : one of {None, 'smooth', 'f-inv'}
Chooses the method to account for high correlation
of nearby frequencies. Only used if norm_type
contains 'frequency'. Default is 'smooth'.
If 'smooth':
A Gaussian smoothing filter is applied to the frequency
dimension only for the purposes of calculating
normalization constants. Other than this the smoothed data
is not used for generating the final result.
If 'f-inv':
Directed spectrum values are scaled by the corresponding
frequency before all other forms of normalization are
applied. This assumes that the power spectra for the data
initially display '1/f' scaling.
If None:
No special correction is applied.
filter_sd : float
Standard deviation of Gaussian filter applied to frequency
dimension, in Hz. Only used if norm_type contains 'frequency'
and fnorm_method is 'smooth'. Default is 6.
The combine_ds function is used to combine DirectedSpectrum objects, typically before normalization.
combine_ds(ds_list)ds_list : list of DirectedSpectrum objects
List of DirectedSpectrum objects to combine.