Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 9 additions & 10 deletions NMFDemo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,6 @@
"##### 6. Interpret results \\[Python & MATLAB\\]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -53,6 +46,11 @@
" That folder should contain *.mat* files corresponding to each file in the *Data* folder with the following naming scheme: _**mouse-name**\\_**date**\\_**experiment-identifier**\\_TIME.mat_ (example: *Mouse798_021519_CUS_TIME.mat*).\n",
" Each of those files will contain a single vector variable named **INT_TIME** that should have *2N* entries, where *N* is the total number of intervals you wish to use.\n",
" The odd entries (1,3,5,...) contain the starting time (in seconds) of each interval, and the even entries (2,4,6...) contain the duration of each interval in seconds.\n",
" \n",
" \n",
" If the optional *useCenteredWindows* parameter is set to **true**, then the project folder must also contain a subfolder named *CENTER_TIME*. The createCENTERTIME script in this repository will fill an empty CENTER_TIME folder in your project folder with files named after your files in *Data* according to prompts you will see when you run the script. Windows will be created around timestamps. The placement of the first timestamp is such that the largest window is able to be made around that timestamp. So if that largest window is 1 second, then the first timestamp will be at .5 seconds in the LFP. By largest, I mean that the features that you can generate with this method can be multiple sizes, so the largest one out of those. Then, it will ask for the distance between the center timestamps, which just means how far apart do you want the centeres of each window to be. If we start at .5 seconds, and the distance between the centered windows is 1, then there will be a timestamp at .5, 1.5, 2.5, etc. There is also the option to remove a window every X number of seconds. This is if for instance you have X second or minute recordings that you've stiched together in one LFP file. By entering the length of the recordings (they all have to be the same length), the timestamps will skip over the boundary between the timepoints. This assumes that the only time there is overlap between two recordings is when the timestamp is right on the boundary between them.\n",
" \n",
" \n",
"\n",
" To format recorded data into windows, run *formatWindows* in MATLAB. *formatWindows* takes one parameter, which names the file that you will save the formatted data to. \n",
"\n",
Expand Down Expand Up @@ -106,7 +104,8 @@
"* featureList: cell array of features to calculate. Options are 'power', 'coherence', and/or 'granger'\n",
"* mvgcFolder: (Only needed if you want to calculate granger features) String giving path to MVGC toolbox. There is a copy of this toolbox inside the lpne-data-analysis repository, so you should be able to use that location\n",
"* parCores: integer indicating number of cores to use for parallel computing. If 0 (default), all tasks executed in serial.\n",
"* windowOpts: a binary vector with length=number of windows, determining which windows to analyze for features."
"* windowOpts: a binary vector with length=number of windows, determining which windows to analyze for features.\n",
"* featureSizes: (Only needed if calculating multiple feature sizes) cell array of size of each window for each feature"
]
},
{
Expand Down Expand Up @@ -506,7 +505,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -520,7 +519,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
"version": "3.9.7"
}
},
"nbformat": 4,
Expand Down
25 changes: 23 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,27 @@
# lpne-data-analysis
# lpne-data-analysis, centered_windows_functionality branch

If a 'true' boolean is passed to the input useCenteredWindows, then you will need a 'CENTER_TIME' folder in the project folder containing files with a list of timestamps to create windows around. The files are named the same as the LFP files, but end in CENTER.mat instead of LFP.mat. Each center file contains three variables:

1. T, which contains the timestamps
2. trial, what trial is each timestamp occuring in
3. percProgTraveledPath, the percent progress through the task the mouse is at that timestamp

If 2 and 3 do not mean anything to you, you can make dummy variables of them.

A special note for formatWindows is that it asks "Enter the largest window length used in your analysis (s)". This comes from that this branch can make multiple lengths of features (i.e. 1 second power features and 2 second coherence features). So if your features are 1 second, 2 seconds, and 2 seconds, you would input 2 to this. You will specify how long each feature is later.

When using saveFeatures, we will have a struct called 'options' contain information about what length each feature is. The field in options should be called featureSizes.

featureSizes: (Optional) Requires featureList to use. Cell array of doubles indicating the window size of each feature in featureList.
The doubles should be in the same order as the features specified in featureList. Only use if centered windows earlier. If you don't know what this means, don't include this.

featureSizes must be used if windows are centered






Generic pre-processing and analysis code for LPNE data science. Takes in data from individual recordings, preprocesses and extracts features, then creates predictive models for tasks and tests performance. Main files are formatWindows.m, preprocessData.m, saveFeatures.m, data_tools.py and validation_tools.py.

### A detailed description of how to use the full pipeline with control over individual steps is given in *NMFDemo.ipynb*

Expand Down
120 changes: 120 additions & 0 deletions createCENTERTIME.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
prompt = 'How long is the largest window of features you are going to create (in seconds)?' ;
largest_window_size = input(prompt);


prompt = 'How far apart do you want each timepoint to be (in seconds)? ';
timepointSepLen = input(prompt);

prompt = 'What is the sampling frequency?';
sampleFreq = input(prompt);

prompt = 'true or false: you have recordings stitched together that are all the same length?';
stitched = input(prompt);

if stitched
prompt = 'How long are the recordings of each file you stitched together?';
recLen = input(prompt);

%read in LFP file
projectFolder = uigetdir('.', 'Select folder containing Data & CHANS subfolders ');
dataFolder = [projectFolder '/Data/'];
centerFolder = [projectFolder '/CENTER_TIME/'];

dataList = dir([dataFolder '*_LFP.mat']);

nSessions = length(dataList);
for k = 1:nSessions
thisFile = dataList(k);
if thisFile.isdir, continue, end

% clear channel data from last file and load it from the next file
clear('-regexp','_\d\d')
filename = thisFile.name;
dataPath = [dataFolder filename];
dataFile = matfile(dataPath);
varsList = who(dataFile);
variableName = char(varsList(1));
variable = load(dataPath, variableName);
varSize = size(variable.(variableName));
lenLFP = varSize(2);

windowSample = largest_window_size * sampleFreq;
startPoint = windowSample/2;

sepSample = timepointSepLen * sampleFreq;

timepoints = [];
currentPoint = startPoint;
recSample = recLen*sampleFreq;


while currentPoint < lenLFP - startPoint
if rem(currentPoint, recSample) ~= 0
timepoints = [timepoints ; currentPoint];
currentPoint = currentPoint + sepSample;
else
disp(currentPoint)
currentPoint = currentPoint + sepSample;
end
end

T = timepoints;
percProgTraveledPath = zeros(length(T),1);
trial = zeros(length(T),1);

newFilename = strrep(filename, 'LFP', 'CENTER');
save(strcat(centerFolder,newFilename), 'T', 'percProgTraveledPath', 'trial')

end
disp('Done')
clear all

else

%read in LFP file
projectFolder = uigetdir('.', 'Select folder containing Data & CHANS subfolders ');
dataFolder = [projectFolder '/Data/'];
centerFolder = [projectFolder '/CENTER_TIME/'];

dataList = dir([dataFolder '*_LFP.mat']);

nSessions = length(dataList);
for k = 1:nSessions
thisFile = dataList(k);
if thisFile.isdir, continue, end

% clear channel data from last file and load it from the next file
clear('-regexp','_\d\d')
filename = thisFile.name;
dataPath = [dataFolder filename];
dataFile = matfile(dataPath);
varsList = who(dataFile);
variableName = char(varsList(1));
variable = load(dataPath, variableName);
varSize = size(variable.(variableName));
lenLFP = varSize(2);

windowSample = largest_window_size * sampleFreq;
startPoint = windowSample/2;

sepSample = timepointSepLen * sampleFreq;

timepoints = [];
currentPoint = startPoint;
while currentPoint < lenLFP - startPoint
timepoints = [timepoints ; currentPoint];
currentPoint = currentPoint + sepSample;
end

T = timepoints;
percProgTraveledPath = zeros(length(T),1);
trial = zeros(length(T),1);

newFilename = strrep(filename, 'LFP', 'CENTER');
save(strcat(centerFolder,newFilename), 'T', 'percProgTraveledPath', 'trial')


end
disp('Done')
clear all
end
80 changes: 69 additions & 11 deletions formatWindows.m
Original file line number Diff line number Diff line change
@@ -1,22 +1,26 @@
function formatWindows(saveFile, useIntervals, projectFolder, chanFile, fs, ...
function formatWindows(saveFile, useIntervals, centeredWindows, projectFolder, chanFile, fs, ...
windowLength)
% formatWindows
% Formats data and labels for use in lpne pipeline
% INPUTS
% saveFile: name of '.mat' file where you would like to save the
% formatted data.
% useIntervals: (optional) boolean indicating whether to only extract
% data from specific intervals. If 'true' there must be an 'INT_TIME'
% data from specific intervals. If 'true', there must be an 'INT_TIME'
% folder in the project folder containing time interval files for
% each LFP file.
% useCenteredWindows: (optional) boolean indicating whether to create
% centered windows around given timepoints. If 'true', there must be
% a 'CENTER_TIME' folder in the project folder containing files with
% a list of millisecond timestamps to create windows around.
% projectFolder: (optional) where the LFP data is stored. Should have 'Data'
% and `CHANS` subfolders.
% chanFile: (optional) filepath of the excel file input containing channel
% naming information. See `NMFDemo.ipynb`.
% fs: (optional) sampling rate, in Hz
% windowLength: (optional) length of one window, in seconds.
% windowLength: (optional) length of largest window size used, in seconds.
% SAVED VARIABLES
% data: MxNXP array of the data for each delay length. M is the #
% data: MxNXP array of the data. M is the #
% of time points. N is the # of channels. P is the # of
% windows. All elements corresponding to data that was not
% saved (i.e. missing channel) should be marked with NaNs.
Expand All @@ -36,24 +40,29 @@ function formatWindows(saveFile, useIntervals, projectFolder, chanFile, fs, ...
if nargin < 2
useIntervals = false;
end

if nargin < 3
centeredWindows = false; %default is not centering windows
end

if nargin < 4
projectFolder = uigetdir('.', 'Select folder containing Data & CHANS subfolders')
dummy = input(['Make sure areas in channel info file match other ' ...
'datasets you plan to combine with this one!!\n', 'ENTER to continue']);
end
if nargin < 4
if nargin < 5
[chanFile, chanPath] = uigetfile([projectFolder '/*.xls*'], 'Select channel info file');
chanFile = [chanPath chanFile];
end
if nargin < 5
if nargin < 6
inputs = inputdlg({'Enter sampling rate (Hz):'});
fs = str2double(inputs{1}); % sampling rate, Hz
end
if nargin < 6
inputs = inputdlg({'Enter window length (s)'});
if nargin < 7
inputs = inputdlg({'Enter the largest window length used in your analysis (s)'});
windowLength = str2double(inputs{1}); % length of one window (s)
end
pointsPerWindow = fs*windowLength;
pointsPerWindow = fs*windowLength; %the amount of millisecond timepoints in one window

% load channel info (and strip of unwanted ' symbols)
chanData = readtable(chanFile, 'ReadVariableNames', false);
Expand All @@ -71,6 +80,9 @@ function formatWindows(saveFile, useIntervals, projectFolder, chanFile, fs, ...
if useIntervals
intFolder = [projectFolder '/INT_TIME/'];
end
if centeredWindows
centerFolder = [projectFolder '/CENTER_TIME/']; %get folder that contains the information about the centered windows
end

% initialize variables to be used in loops below
dataCells = {};
Expand All @@ -85,6 +97,7 @@ function formatWindows(saveFile, useIntervals, projectFolder, chanFile, fs, ...
filename = thisFile.name;
dataPath = [dataFolder filename];
load(dataPath)


% load channel info; save channel names if needed
clear('CHANNAMES', 'CHANACTIVE')
Expand All @@ -101,6 +114,16 @@ function formatWindows(saveFile, useIntervals, projectFolder, chanFile, fs, ...
intEnd = (intStart -1) + numIntWindows*pointsPerWindow;
end

if centeredWindows
% load timsetamp, trial number, and percent progress
centerFile = [centerFolder regexprep(filename, 'LFP.mat', 'CENTER.mat')];
load(centerFile)
timestamps = T;
trialNumber = trial;
progress = percProgTraveledPath;
end


% extract mouse name and experiment data from filename
nameParts = split(filename,'_');
mousename = nameParts{1};
Expand All @@ -119,6 +142,8 @@ function formatWindows(saveFile, useIntervals, projectFolder, chanFile, fs, ...
if useIntervals
I = length(intStart);
nWindows = sum(numIntWindows);
elseif centeredWindows
nWindows = length(timestamps); %the number of timestamps is the number of windows
else
nWindows = floor(length(thisChannel)/pointsPerWindow);
end
Expand All @@ -131,6 +156,7 @@ function formatWindows(saveFile, useIntervals, projectFolder, chanFile, fs, ...
labels.allWindows.expDate(fileIdx) = {date};
labels.allWindows.time(fileIdx) = 1:nWindows;


if useIntervals
% save interval labels
intLabels = zeros(1, nWindows);
Expand All @@ -142,7 +168,18 @@ function formatWindows(saveFile, useIntervals, projectFolder, chanFile, fs, ...
end
labels.allWindows.interval(fileIdx) = intLabels;
end
end

if centeredWindows
% save centered window labels
j = 1;
for i = fileIdx
labels.allWindows.timestamps(i) = {timestamps(j)};
labels.allWindows.trialNumber(i) = {trialNumber(j)};
labels.allWindows.progress(i) = {progress(j)};
j = j+1; %we always want j to be from 1:300 while as we go from file to file, i keeps increasing
end
end
end

% skip inactive or unused channels, leaving them as nans
activeIdx = strcmp(CHANNAMES,channel{c});
Expand All @@ -153,11 +190,28 @@ function formatWindows(saveFile, useIntervals, projectFolder, chanFile, fs, ...
if useIntervals
% extract intervals, slice data for each interval into windows and concatenate
usableData = zeros(pointsPerWindow, nWindows);
for i = 1:I

for i = 1:I
thisInterval = thisChannel(intStart(i):intEnd(i));
thisInterval = reshape(thisInterval, pointsPerWindow, numIntWindows(i));
usableData(:,wStart(i):wEnd(i)) = thisInterval;
end
elseif centeredWindows
% for each millisecond timepoint, create a centered window
% around it of the largest window length and concatenate all
% together
usableData = zeros(pointsPerWindow, nWindows); %initialize where windows will go

halfWindow = pointsPerWindow/2; %size of half the window

for i = 1:nWindows
%create all windows for a channel
thisCentered = thisChannel(timestamps(i)+(1-halfWindow):timestamps(i)+halfWindow);
thisCentered = reshape(thisCentered, pointsPerWindow, 1);
usableData(1:pointsPerWindow, i) = thisCentered;
end


else
% set whole recording as single interval
usableData = thisChannel(1:(nWindows*pointsPerWindow));
Expand Down Expand Up @@ -185,3 +239,7 @@ function formatWindows(saveFile, useIntervals, projectFolder, chanFile, fs, ...
% save data
save(saveFile,'data','labels','-v7.3')
end




6 changes: 4 additions & 2 deletions runFeaturePipeline.m
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,9 @@ function runFeaturePipeline(saveFile, dataOpts, featureOpts, trainFile)
end
end

formatWindows(saveFile)
intervals = input('true or false: Are you using an INT_TIME folder to slice intervals? ');
center = input('true or false: Are you using an CENTER_TIME folder to make centered windows? ');
formatWindows(saveFile, intervals, center) %did the above so can be able to use intervals or centered windows here

if trainFile
train = load(trainFile, 'dataOpts');
Expand All @@ -77,4 +79,4 @@ function runFeaturePipeline(saveFile, dataOpts, featureOpts, trainFile)

preprocessData(saveFile, dataOpts)

saveFeatures(saveFile, featureOpts)
saveFeatures(saveFile, featureOpts)
Loading