You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The goal of the current sprint session is to better handle the tsvtools, which are still very Alzheimer-oriented.
For example we definitely want to avoid hard-coded default values linked to this particular context (see Issue #253).
Moreover, the structure created by tsvtools is quite heavy, and it could be nice to have everything concatenated in one single TSV file. In the following sections, we detail the issues with each tsvtool independently.
This version is the updated version after the meeting of the ClinicaDL team
getlabels is very Alzheimer oriented as it seeks to find the labels corresponding to stable AD, stable CN, pMCI, sMCI and non-regressive MCI which are only relevant in Alzheimer's disease context.
We can choose to leave the command as is and tell users that if they want labels linked to another context they have to make them themselves.
Otherwise we can let the user define a progression pattern (CN --> MCI --> AD) and then output general labels composed of two parts:
the first part highlight the stability of the label
sstable if it remains identical across all sessions in a given time,
pprogressive if it progresses to the following state in a given time (eg. MCI --> AD),
rregressive if it regresses to the previous state in a given time (eg. MCI --> CN),
ukunknown if there are not enough sessions to assess the reliability of the label but no changes were spotted,
Then rAD correspond to Alzheimer's disease regressing to MCI or CN, whereas ukCN corresponds to CN participants who always remained CN but with not enough sessions to assess their stability. This framework is a bit complex and non-exhaustive, so maybe we can just stay that way.
Structure simplification
getlabels outputs one TSV file per label. Instead we could have only one TSV file with columns containing the value of the label.
Example of TSV produced by getlabels:
participant_id
session_id
group
subgroup
age
sex
...
sub-CLNC0001
ses-M00
MCI
sMCI
72
M
...
sub-CLNC0002
ses-M00
MCI
pMCI
65
F
...
sub-CLNC0002
ses-M06
MCI
pMCI
66
F
...
sub-CLNC0003
ses-M00
AD
AD
89
F
...
split and kfold
Context generalization
split integrates a command to avoid data leakage between subgroups. Indeed getlabels creates a TSV file for MCI participants and two other TSV files for sMCI and pMCI, and these ones are included in the MCI group.
With the new system columns group and subgroup may allow to take into account the inclusion of one group in another one, and it is not MCI-dependent anymore.
Structure simplification
If we work with one TSV file only as suggested in getlabels structure simplification, we would have to stratify the splits by the label to ensure their correct distributions.
The procedure would then not depend on the nature of the split (SingleSplit, Kfold), but on the nature of the set extracted (test or validation).
tsvtool test creates a new TSV file test.tsv containing the keys (participant_id, session_id) included in the test set.
tsvtool validation creates a new TSV file named according to the procedure performed (for example kfold-3).
it excludes the participants already selected in test.tsv if such TSV file exists
For each key, it explicits which set it belongs to for each split according to the following structure (example for a 2-fold validation):
participant_id
session_id
split_index
split_type
sub-CLNC0001
ses-M00
0
train
sub-CLNC0001
ses-M00
1
validation
sub-CLNC0002
ses-M00
0
train
sub-CLNC0002
ses-M00
1
validation
sub-CLNC0002
ses-M06
0
train
sub-CLNC0002
ses-M06
1
N/A
sub-CLNC0003
ses-M00
0
validation
sub-CLNC0003
ses-M00
1
train
To note, we only want the baseline sessions in the validation set, then some sessions may not belong to any set.
For the predict function, it would be possible to give a directory and ClinicaDL would automatically understand that it has to use the indices of test.tsv and metadata of the getlabels TSV file.
To ease the use of this new structure, a new tool should be implemented to be able to reconstruct the TSV file with all metadata corresponding to one split / the test set.
analysis
Context generalization
Must become diagnosis independent (is now meant to work on AD, CN, MCI...).
Also it was made to work on specific dementia scores (MMSE and CDR) which are handled specifically as they can be either continuous are discreete.
Maybe instead the user could give the set of discrete or continuous values they want to integrate in the analysis.
We can also rely on pandas to automatically detect which columns correspond to continuous values or discreete values.
BONUS: the table_to_latex function could automatically generate your population table ready to be copy-pasted in your LaTeX article!
Structure simplification
N/A
restrict
Specific to AD-DL analysis. Could be definitely removed.
Global structure
Before beginning this procedure, the user must have a BIDS.
Current version
The steps to be performed before clinicadl train are currently the following ones:
clinica iotools merge-tsv sums up the BIDS in one TSV file.
input: BIDS directory
output: TSV file
clinica iotools missing-mods explicits which modalities are available in the BIDS.
input: BIDS directory
output: directory
clinicadl tsvtool getlabels extracts Alzheimer's disease labels according to a modality in the BIDS - optional, as the users may have their own labels.
output: directory containing a series of TSV files
clinicadl tsvtool split | kfold separates the test set, and then define the train / validation scheme
input: getlabels directory
output: directory containing a series of TSV file
Independently, the extract command must be run to define the mode (image, patch, slice or roi) and their corresponding parameters, as well as the parameters corresponding to the preprocessing pipeline wanted (t1-linear, pet-linear or custom).
Proposition
The steps depending on Clinica should be already integrated in the CAPS as the preprocessing pipelines were conducted, then the steps (1) and (2) could disappear.
(3) may be performed or not depending on the needs of the user (if not the user must provide an equivalent TSV file).
(4) could be renamed prepare-experiment and would allow to explicit the data repartition between train / validation and test sets.
(5) could be renamed prepare-data and would allow to prepare neuroimaging data.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi everyone,
The goal of the current sprint session is to better handle the
tsvtools
, which are still very Alzheimer-oriented.For example we definitely want to avoid hard-coded default values linked to this particular context (see Issue #253).
Moreover, the structure created by
tsvtools
is quite heavy, and it could be nice to have everything concatenated in one single TSV file. In the following sections, we detail the issues with eachtsvtool
independently.This version is the updated version after the meeting of the ClinicaDL team
Independant functionnalities
getlabels
Context generalization
getlabels
is very Alzheimer oriented as it seeks to find the labels corresponding to stable AD, stable CN, pMCI, sMCI and non-regressive MCI which are only relevant in Alzheimer's disease context.We can choose to leave the command as is and tell users that if they want labels linked to another context they have to make them themselves.
Otherwise we can let the user define a progression pattern (CN --> MCI --> AD) and then output general labels composed of two parts:
s
stable if it remains identical across all sessions in a given time,p
progressive if it progresses to the following state in a given time (eg. MCI --> AD),r
regressive if it regresses to the previous state in a given time (eg. MCI --> CN),uk
unknown if there are not enough sessions to assess the reliability of the label but no changes were spotted,us
unstable otherwise (multiple conversions / regressions).Then rAD correspond to Alzheimer's disease regressing to MCI or CN, whereas ukCN corresponds to CN participants who always remained CN but with not enough sessions to assess their stability. This framework is a bit complex and non-exhaustive, so maybe we can just stay that way.
Structure simplification
getlabels
outputs one TSV file per label. Instead we could have only one TSV file with columns containing the value of the label.Example of TSV produced by
getlabels
:split
andkfold
Context generalization
split
integrates a command to avoid data leakage between subgroups. Indeedgetlabels
creates a TSV file for MCI participants and two other TSV files for sMCI and pMCI, and these ones are included in the MCI group.With the new system columns
group
andsubgroup
may allow to take into account the inclusion of one group in another one, and it is not MCI-dependent anymore.Structure simplification
If we work with one TSV file only as suggested in
getlabels
structure simplification, we would have to stratify the splits by the label to ensure their correct distributions.The procedure would then not depend on the nature of the split (
SingleSplit
,Kfold
), but on the nature of the set extracted (test or validation).tsvtool test
creates a new TSV filetest.tsv
containing the keys (participant_id, session_id) included in the test set.tsvtool validation
creates a new TSV file named according to the procedure performed (for examplekfold-3
).test.tsv
if such TSV file existsTo note, we only want the baseline sessions in the validation set, then some sessions may not belong to any set.
For the
predict
function, it would be possible to give a directory and ClinicaDL would automatically understand that it has to use the indices oftest.tsv
and metadata of thegetlabels
TSV file.To ease the use of this new structure, a new tool should be implemented to be able to reconstruct the TSV file with all metadata corresponding to one split / the test set.
analysis
Context generalization
Must become diagnosis independent (is now meant to work on AD, CN, MCI...).
Also it was made to work on specific dementia scores (MMSE and CDR) which are handled specifically as they can be either continuous are discreete.
Maybe instead the user could give the set of discrete or continuous values they want to integrate in the analysis.
We can also rely on pandas to automatically detect which columns correspond to continuous values or discreete values.
BONUS: the
table_to_latex
function could automatically generate your population table ready to be copy-pasted in your LaTeX article!Structure simplification
N/A
restrict
Specific to AD-DL analysis. Could be definitely removed.
Global structure
Before beginning this procedure, the user must have a BIDS.
Current version
The steps to be performed before
clinicadl train
are currently the following ones:clinica iotools merge-tsv
sums up the BIDS in one TSV file.clinica iotools missing-mods
explicits which modalities are available in the BIDS.clinicadl tsvtool getlabels
extracts Alzheimer's disease labels according to a modality in the BIDS - optional, as the users may have their own labels.merge-tsv
TSV file +missing-mods
directoryclinicadl tsvtool split | kfold
separates the test set, and then define the train / validation schemegetlabels
directoryIndependently, the
extract
command must be run to define themode
(image
,patch
,slice
orroi
) and their corresponding parameters, as well as the parameters corresponding to the preprocessing pipeline wanted (t1-linear
,pet-linear
orcustom
).Proposition
The steps depending on Clinica should be already integrated in the CAPS as the preprocessing pipelines were conducted, then the steps (1) and (2) could disappear.
(3) may be performed or not depending on the needs of the user (if not the user must provide an equivalent TSV file).
(4) could be renamed
prepare-experiment
and would allow to explicit the data repartition between train / validation and test sets.(5) could be renamed
prepare-data
and would allow to prepare neuroimaging data.Beta Was this translation helpful? Give feedback.
All reactions