Merge pull request #3 from DouweHorsthuis/update

finalizing scripts updating function to delete bad continues data updating 160channel location file updating pca before ICA
CognitiveNeuroLab · Dec 7, 2021 · 3746d70 · 3746d70
2 parents a834c20 + a967736
commit 3746d70
Show file tree

Hide file tree

Showing 15 changed files with 364 additions and 324 deletions.
diff --git a/README.md b/README.md
@@ -32,8 +32,9 @@ This is still a work in progress. This Repo will contain the full pipeline to an
     - [Pre processing](#pre-processing) 
 3. [Power Frequency Analysis](#power-frequency-analysis)  
 3. [Microstates](#microstates)  
-    - [I_Mictrostates_groups](#i_mictrostates_groups)  
-    - [I_Mictrostates_all](#i_mictrostates_all)    
+    - [I_Mictrostates_groups](#i-mictrostates-groups)  
+    - [I_Mictrostates_all](#i-mictrostates-all)
+3. [Extra functions](#extra-functions)
 3. [License](#license)  
 3. [Contact](#contact)  
 3. [Acknowledgement](#acknowledgement)  
@@ -88,25 +89,26 @@ One of the issues we encountered was that some participants had their data colle
 The data is down-sampled from 512Hz to 256 Hz.  
 Externals are all deleted since not everyone has externals. So we cannot use them as a reference.  
 We apply a 1Hz (filter order 1690) and 50Hz (filter order 136) filter.
-We add channel info to all the channel. For this we use the following 3 files: standard-10-5-cap385, Cap160_fromBESAWebpage, BioSemi64. The first 2 are from BESA and have the correct layout. The 3rd is needed for the MoBI data. You can find these in the Functions and files folder (inside the src folder).  
-Lastly this script uses eeglab's clean_artifacts function deletes the bad channels. Channels will get deleted by the standard noise criteria, if they are flat over 4 seconds and the function checks if channels are overly correlated with each other. **double check this last statement**
+We add channel info to all the channel. For this we use the following 3 files: standard-10-5-cap385, BioSemi160, BioSemi64. The first 2 are from BESA and have the correct layout. The 3rd is needed for the MoBI data. You can find these in the Functions and files folder (inside the src folder).  
+Lastly this script uses eeglab's clean_artifacts function deletes the bad channels. Channels will get deleted by the standard noise criteria, if they are flat over 5 seconds and the function checks if channels are overly correlated with each other. The function also deletes continues data if there is bad data. It breaks the data in 0.5 second steps. Data is bad if it off by 5 std dev. Which is a "A quite conservative" and default value according to the function.
 
 #### C_manual_check
 This script plots all the data in EEGlab as continues data and allows you to delete channels manually. 
 
 #### D_preprocces2
-This script will double check and fix any potential trigger issue we encountered. It saves a Matrix with the information for each individual participant. **This script can be skipped** It is only useful for documenting triggers. We added the [pop_rejcont](https://github.com/wojzaremba/active-delays/blob/master/external_tools/eeglab11_0_4_3b/functions/popfunc/pop_rejcont.m) in the next script and this deletes triggers sometimes, so we need to double check triggers again (see [G_preprocces5](#g_preprocces5)).
+This script will double check and fix any potential trigger issue we encountered. It saves a Matrix with the information for each individual participant. **This script can be skipped** It is only useful for documenting triggers. We added the ~~ [pop_rejcont](https://github.com/wojzaremba/active-delays/blob/master/external_tools/eeglab11_0_4_3b/functions/popfunc/pop_rejcont.m) in the next script and this deletes triggers sometimes~~, so we need to double check triggers again (see [G_preprocces5](#g_preprocces5)).
+Since off the 12/6/2021 update this is not the case. The clean_artifacts takes care of noisy continues data. But it might be good to run either script since they are quick. If you want to save time, skip this one.
 
 #### E_preprocces3
 This script will do an average reference.  
 This is followed by an [Independent Component Analysis](https://eeglab.org/tutorials/06_RejectArtifacts/RunICA.html). We use the pca option to prevent rank-deficiencies.
-After his we delete only eye components by using [IClabel](https://github.com/sccn/ICLabel). IClabel will only delete the component if it has more than 80% eye data and less then 10% brain data. We arrived at this criteria after comparing (for a different dataset) how many components we (Ana, Douwe and Filip) would delete manually and what threshold would get the closesed to that.
-After that we use [pop_rejcont](https://github.com/wojzaremba/active-delays/blob/master/external_tools/eeglab11_0_4_3b/functions/popfunc/pop_rejcont.m). This function epochs the data temporatly and deletes the epochs that are noisy. We set this to a threshold of 8, because this would delete between 0-20% of the data. We save a matlab structure with how much data of each participant get's deleted. 
+After his we delete only eye components by using [IClabel](https://github.com/sccn/ICLabel). IClabel will only delete the component if it has more than 80% eye data and less then 10% brain data. We arrived at this criteria after comparing (for a different dataset) how many components we (Ana, Douwe and Filip) would delete manually and what threshold would get the closest to that.
+After that we use [pop_rejcont](https://github.com/wojzaremba/active-delays/blob/master/external_tools/eeglab11_0_4_3b/functions/popfunc/pop_rejcont.m). This function epochs the data temporally and deletes the epochs that are noisy. We set this to a threshold of 8, because this would delete between 0-20% of the data. We save a matlab structure with how much data of each participant get's deleted. 
 
 **note** for the Aging group, we use the [pop_rejcont](https://github.com/wojzaremba/active-delays/blob/master/external_tools/eeglab11_0_4_3b/functions/popfunc/pop_rejcont.m) function also right before the ICA. This is because the data was too noisy for more than 50% of the participants to find eye components. 
 
 #### F_preprocces4
-This script loads a file with all the original channels, deletes the externals and uses these file locations to interpolate the channels of the corresponding's subjects data.  
+This script loads a file with all the original channels, deletes the externals and uses these file locations to interpolate the channels of the corresponding subject's data.  
 In the case of 160 channel data, it uses the [transform_n_channels](https://github.com/CognitiveNeuroLab/Interpolating_160ch_to_64ch_eeglab) function to interpolate the remaining channels not to the original 160, but to 64 channel data so that it is the same as all the other data. For this to work Matlab needs to know the location of 2 things, the trannsform_n_channel.m file and the EEG files called 64.set and 64.fdt.
 
 #### G_preprocces5
@@ -115,13 +117,13 @@ In this script, we first make sure that the triggers are still in the right plac
 ### Power Frequency Analysis
 
 
-After that we use the the [pwelch function of Matlab](https://www.mathworks.com/help/signal/ref/pwelch.html) and a log tranformation of the results to get the power frequency results.  
+After that we use the the [pwelch function of Matlab](https://www.mathworks.com/help/signal/ref/pwelch.html) and a log transformation of the results to get the power frequency results.  
 
-# add here what channels we use, for now it's just indivual but we will change this to groups and averages of those groups
+for now we are only using pre-selected channels. But it's possible to take averages and or instead do this for every channel.
 
 
 ### Microstates
-this script follows the code as descibed in Poulsen, A. T., Pedroni, A., Langer, N., & Hansen, L. K. (2018). Microstate EEGlab toolbox: An introductory guide. [See their guide in bioRxiv for more information.](https://www.biorxiv.org/content/10.1101/289850v1)
+this script follows the code as described in Poulsen, A. T., Pedroni, A., Langer, N., & Hansen, L. K. (2018). Microstate EEGlab toolbox: An introductory guide. [See their guide in bioRxiv for more information.](https://www.biorxiv.org/content/10.1101/289850v1)
 
 #### I_Mictrostates_groups
 In the We first focuses on the group level. Since we use both eyes open and eyes closed data, we want to check how many microstates are suggested for both, so we can choose the best (same) amount for both. In the case of patient/control group we would need to compare all 4 the suggestions. Running the whole script would take a lot of time that wasn't needed.
@@ -130,7 +132,9 @@ In the We first focuses on the group level. Since we use both eyes open and eyes
 
 The second script will backfit the mictrostates on the individual EEGs (since now you know how many microstates you want). Giving both plots per subject and adds data to the EEG structure to do stats on.  
 
-
+#### Extra functions  
+
+Because this script uses both 160 and 64 channel collected with biosemi caps we needed to get the to a same format. To do this we are using the transform_n_channels function [documented here](https://github.com/CognitiveNeuroLab/Interpolating-channels-between-different-cap-sizes). In this case it turns all the 160 channel data into 64 channels. But it will keep the original channels that are closed to the location of the corresponding 64ch cap.
 
 ## License
 

diff --git a/src/A_XDF_merge_sets.m b/src/A_XDF_merge_sets.m
@@ -1,5 +1,8 @@
-% Testing the scr code 6/21/2021
-% ----------------------stopped working at 12376
+% Restingstate pipepline (2021)
+% Final version of SRC code 12/6/2021
+% Merging script to create .set file from MoBI particpant.
+% Base file is .XDF not .BDF needs extra EEGLAB plugin
+% ----------------------
 subject_list = {'12851'};%{'12856' '12857' '12859' '12871' '12872' '12892'};%'12022' '12023' '12031' '12081' '12094' '12188' '12255' '12335' '12339' '12362' '12364' '12372' '12376' '12390' '12398' '12407' '12408' '12451' '12454' '12457' '12458' '12459' '12468' '12478' '12498' '12510' '12517' '12532' '12564' '12631' '12633' '12634' '12636' '12665' '12670' '12696' '12719' '12724' '12751' '12763' '12769' '12776' '12790' '12806' '12814' '12823' '12830' '12847' '12851' '12855' '12856' '12857' '12859' '12871' '12872' '12892'   }; %all the IDs for the indivual particpants
 filename     = 'Resting_State'; % if your bdf file has a name besides the ID of the participant (e.g. oddball_paradigm)
 home_path    = '\\data.einsteinmed.org\users\Filip Ana Douwe\Resting state data\MoBI\'; %place data is (something like 'C:\data\')
@@ -11,7 +14,6 @@
     data_path  = [home_path subject_list{s} '\'];
     disp([data_path  filename '.xdf'])
 
-
     EEG  = pop_loadxdf([data_path filename  '.xdf'] , 'streamtype', 'EEG', 'exclude_markerstreams', {});
     temp      = EEG.data;
     EEG.data  = temp([2:65],:);   % This should work for a 64 cap

diff --git a/src/A_bdf_merge_sets.m b/src/A_bdf_merge_sets.m
@@ -1,8 +1,8 @@
-% Testing the scr code 6/21/2021
-%% extra controls
-subject_list = {'10297' '10331' '10385' '10497' '10553' '10590' '10640' '10867' '10906' '12004' '12006' '12139' '12177' '12188' '12197' '12203' '12206' '12230' '12272' '12415' '12449' '12474' '12482' '12516' '12534' '12549' '12588' '12632' '12735' '12746' '12755' '12770' '12852' '12870'};
-%subject_list = {'10033' '10130' '10131' '10257' '10281' '10293' '10360' '10369' '10385' '10394' '10438' '10446' '10463' '10476' '10526' '10545' '10561' '10562' '10581' '10585' '10616' '10748' '10780' '10784' '10822' '10858' '10906' '10915' '10929' '10935' '12005' '12006' '12007' '12010' '12215' '12328' '12360' '12413' '12512' '12648' '12651' '12707' '12727' '12739' '12750' '12815' '12898' '12899'};% ------------------------------------------------
+% Restingstate pipepline (2021)
+% Final version of SRC code 12/6/2021
+% Merging script to create .set file
 
+subject_list = {'10297' '10331' '10385' '10497' '10553' '10590' '10640' '10867' '10906' '12004' '12006' '12139' '12177' '12188' '12197' '12203' '12206' '12230' '12272' '12415' '12449' '12474' '12482' '12516' '12534' '12549' '12588' '12632' '12735' '12746' '12755' '12770' '12852' '12870' '10033' '10130' '10131' '10257' '10281' '10293' '10360' '10369' '10385' '10394' '10438' '10446' '10463' '10476' '10526' '10545' '10561' '10562' '10581' '10585' '10616' '10748' '10780' '10784' '10822' '10858' '10906' '10915' '10929' '10935' '12005' '12006' '12007' '12010' '12215' '12328' '12360' '12413' '12512' '12648' '12651' '12707' '12727' '12739' '12750' '12815' '12898' '12899'};% ------------------------------------------------
 %subject_list = {'1106' '1108' '1132' '1134' '1154' '1160' '1173' '1174' '1179' '1190' '1838' '1839' '1874' '11013' '11051' '11056' '11098' '11106' '11198' '11220' '11244' '11293' '11325' '11354' '11369' '11375' '11515' '11560' '11580' '11667' '11721' '11723' '11750' '11852' '11896' '11898' '11913' '11927' '11958' '11965'}; %all the IDs for the indivual particpants
 filename     = 'restingstate'; % if your bdf file has a name besides the ID of the participant (e.g. oddball_paradigm)
 home_path    = 'C:\Users\dohorsth\Desktop\Testing restingstate\Remaining_controls\'; %place data is (something like 'C:\data\')
@@ -14,21 +14,19 @@
     data_path  = [home_path subject_list{s} '\'];
     disp([data_path  subject_list{s} '_' filename '.bdf'])
 
-    %if blocks == 1
-    %if participants have only 1 block, load only this one file
-    EEG = pop_biosig([data_path  subject_list{s} '_' filename '.bdf']);
-
-
-    %     else
-    %         for bdf_bl = 1:blocks
-    %             %if participants have more than one block, load the blocks in a row
-    %             %your files need to have the same name, except for a increasing number at the end (e.g. id#_file_1.bdf id#_file_2)
-    %             EEG = pop_biosig([data_path  subject_list{s} '_' filename '_' num2str(bdf_bl) '.bdf']);
-    %             [ALLEEG, ~] = eeg_store(ALLEEG, EEG, CURRENTSET);
-    %         end
-    %         %since there are more than 1 files, they need to be merged to one big .set file.
-    %         EEG = pop_mergeset( ALLEEG, 1:blocks, 0);
-    %     end
+    if blocks == 1
+        %if participants have only 1 block, load only this one file
+        EEG = pop_biosig([data_path  subject_list{s} '_' filename '.bdf']); 
+    else
+        for bdf_bl = 1:blocks
+            %if participants have more than one block, load the blocks in a row
+            %your files need to have the same name, except for a increasing number at the end (e.g. id#_file_1.bdf id#_file_2)
+            EEG = pop_biosig([data_path  subject_list{s} '_' filename '_' num2str(bdf_bl) '.bdf']);
+            [ALLEEG, ~] = eeg_store(ALLEEG, EEG, CURRENTSET);
+        end
+        %since there are more than 1 files, they need to be merged to one big .set file.
+        EEG = pop_mergeset( ALLEEG, 1:blocks, 0);
+    end
     [ALLEEG EEG CURRENTSET] = pop_newset(ALLEEG, EEG, 0,'setname', [subject_list{s} ' restingstate paradigm'],'gui','off');   %adds a name to the internal .set file
     %save the bdf as a .set file
     EEG = pop_saveset( EEG, 'filename',[subject_list{s} '.set'],'filepath',data_path);

diff --git a/src/A_bdf_non_paradigm_merge_sets.m b/src/A_bdf_non_paradigm_merge_sets.m
@@ -1,11 +1,11 @@
-% Testing the scr code 6/21/2021
+% Restingstate pipepline (2021)
+% Final version of SRC code 12/6/2021
+% Merging script to create .set file for files that were collected without
+% using paradigm (particpant was told to close eyes or stare at the center
+% of a black screen)
 % ------------------------------------------------
-%% extra controls
 clear variables
-%_closed.bdf
-%_open.bdf
-subject_list = {'10399' '12002' '12122'};
-%subject_list = {'10158' '10165' '10384' '10407' '10451' '10467' '10501' '10534' '10615' '10620' '10639' '10844' '10956'};
+subject_list = {'10158' '10165' '10384' '10407' '10451' '10467' '10501' '10534' '10615' '10620' '10639' '10844' '10956' '10399' '12002' '12122'};
 %subject_list = {'1101' '1164' '1808' '1852' '1855' '11014' '11094' '11151' '11170' '11275' '11349' '11516' '11558' '11583' '11647' '11729' '11735' '11768' '11783' '11820' '11912'};
 filename     = 'restingstate'; % if your bdf file has a name besides the ID of the participant (e.g. oddball_paradigm)
 home_path    = 'C:\Users\dohorsth\Desktop\Testing restingstate\Remaining_controls\'; %place data is (something like 'C:\data\')

diff --git a/src/B_preprocess1.m b/src/B_preprocess1.m
@@ -1,10 +1,12 @@
-% Restingstate pipeline 8/24/2021 DH AF PS
+% Restingstate pipepline (2021)
+% Final version of SRC code 12/6/2021
 % fixing channel names for people with 160 config file with only 64 channels
 % downsample
 % exclude externals
 % 1hz and 50hz filter
 % channel info
 % exclude channels
+% excluding bad burst of data (added 12/6/2021, everything ran before does not use this)
 % ------------------------------------------------
 clear variables
 eeglab
@@ -24,6 +26,7 @@
         home_path  = 'C:\Users\dohorsth\Desktop\Testing restingstate\Remaining_controls\';
     end
     deleted_channels=zeros(length(subject_list),2);
+    deleted_data=zeros(length(subject_list),2);
     wrongconfig_type2 = zeros(1,length(subject_list));
     % Loop through all subjects
     for s=1:length(subject_list)
@@ -70,19 +73,28 @@
             EEG=pop_chanedit(EEG, 'lookup',[home_path 'standard-10-5-cap385.elp']); %make sure you put here the location of this file for your computer
         elseif EEG.nbchan >159 && EEG.nbchan < 191 %160chan cap
             if isempty(EEG.chanlocs) && EEG.nbchan==160
-                EEG = pop_editset(EEG, 'chanlocs', [home_path 'Cap160_fromBESAWebpage.sfp']); %need to first load any sort of sfp file with the correct channels (the locations will be overwritten to the correct ones later)
+                EEG = pop_editset(EEG, 'chanlocs', [home_path 'BioSemi160.sfp']); %need to first load any sort of sfp file with the correct channels (the locations will be overwritten to the correct ones later)
             else
-                EEG=pop_chanedit(EEG, 'lookup',[home_path 'Cap160_fromBESAWebpage.sfp']); %make sure you put here the location of this file for your computer
+                EEG=pop_chanedit(EEG, 'lookup',[home_path 'BioSemi160.sfp']); %make sure you put here the location of this file for your computer
             end
         end
         EEG = pop_saveset( EEG, 'filename',[subject_list{s} '_info.set'],'filepath', data_path);
         old_n_chan = EEG.nbchan;
-        EEG = clean_artifacts(EEG, 'FlatlineCriterion',5,'ChannelCriterion',0.8,'LineNoiseCriterion',4,'Highpass','off','BurstCriterion','off','WindowCriterion','off','BurstRejection','on','Distance','Euclidian');
+        old_samples=EEG.pnts;
+        %old way, only channel rejection - used for Aging and ASD: 
+        %EEG = clean_artifacts(EEG, 'FlatlineCriterion',5,'ChannelCriterion',0.8,'LineNoiseCriterion',4,'Highpass','off','BurstCriterion','off','WindowCriterion','off','BurstRejection','on','Distance','Euclidian');
+        %new way, also bad data (bursts) rejection:
+        % the only thing to double check is if it will still reject eye
+        % components in ICA or if these are pre-deleted now (which they shouldn't)
+        EEG = pop_clean_rawdata(EEG, 'FlatlineCriterion',5,'ChannelCriterion',0.8,'LineNoiseCriterion',4,'Highpass','off','WindowCriterion',0.25,'BurstRejection','on','Distance','Euclidian','WindowCriterionTolerances',[-Inf 7] );
         new_n_chan = EEG.nbchan;
+        new_samples=EEG.pnts;
         deleted_channels(s,:) = [string(subject_list{s}), old_n_chan-new_n_chan] ;
+        deleted_data(s,:) = [string(subject_list{s}), new_n_chan/old_samples*100] ;
         EEG = pop_saveset( EEG, 'filename',[subject_list{s} '_exchn.set'],'filepath', data_path);
     end
     %saving matrixes for quality control
     save([home_path 'wrongconfig_type2'], 'wrongconfig_type2');
-    save([home_path '_deleted_channels'], 'deleted_channels')
+    save([home_path '_deleted_channels'], 'deleted_channels');
+    save([home_path '_deleted_data']    , 'deleted_data');
 end