Corrects of some typos and adds clarifications (#37)

thibaultdvx · web-flow · commit caab842be7d6 · 2024-03-19T17:55:06.000+01:00
diff --git a/src/generate.py b/src/generate.py
@@ -55,10 +55,9 @@
 # <img src="../images/generate_trivial.png" alt="generate trivial" style="height: 350px; margin: 10px; text-align: center;">
 
 # ```{warning}
-# You need to execute the `clinica run` and `clinicadl prepare-data` pipelines
-# before running this task.  Moreover, the trivial option can synthesize at
-# most $n$ images per label, where $n$ is the total number of images in the 
-# input CAPS.
+# You need to execute the `clinica run` pipeline before running this task.  
+# Moreover, the trivial option can synthesize at most n images per label, 
+# where n is the total number of images in the input CAPS.
 # ```
 # ### Running the task
 #
@@ -67,7 +66,7 @@
 # ```
 # where:
 
-# - `caps_directory` is the output folder containing the results in a
+# - `caps_directory` is the output folder containing the results of `clinica run` in a
 # [CAPS](https://aramislab.paris.inria.fr/clinica/docs/public/latest/CAPS/Introduction/) hierarchy,
 # - `output_directory` is the folder where the synthetic CAPS is stored,
 # - `n_subjects` is the number of subjects per label in the synthetic dataset.
@@ -85,49 +84,55 @@
 
 # In order to train a network, meta data must be organized in a file system
 # generated by `clinicadl tsvtools`. For more information on the following
-# commands, please refer to the section ["Define your
-# population"](./label_extraction.ipynb).
+# commands, please refer to the section [Define your
+# population](./label_extraction.ipynb).
 # %% [markdown]
-# #### Get the labels AD and CN.
-# This command needs a BIDS folder as an argument in order to create the
-# `missing_mods_directory` and the `merged.tsv` file, but if you already 
+# #### Get the labels AD and CN
+# `get-labels` command needs a BIDS folder as an argument in order to create the
+# `missing_mods` directory and the `merged_tsv` file, but if you already 
 # have these, you can give an empty folder as argument and provide the paths 
 # to the required files separately as keyword arguments.
 
-# Be careful, the output of the command (`labels.tsv`) is saved in the same
-# folder as the BIDS folder.
 # %%
 !mkdir data/fake_bids
-!clinicadl tsvtools get-labels data/fake_bids --missing_mods data/synthetic/missing_mods --merged_tsv data/synthetic/data.tsv --modality synthetic
+!clinicadl tsvtools get-labels data/fake_bids data --missing_mods data/synthetic/missing_mods --merged_tsv data/synthetic/data.tsv --modality synthetic
 # %%
 # Split train and test data
 !clinicadl tsvtools split data/labels.tsv --n_test 0.25 --subset_name test
 # %%
-# Split train and validation data in a 5-fold cross-validation
+# Split train and validation data in a 3-fold cross-validation
 !clinicadl tsvtools kfold data/split/train.tsv --n_splits 3
 # %% [markdown]
 # ## Train a model on synthetic data
 
-# Once data was generated and split it is possible to train a model using
+# Once data was generated and split, it is possible to train a model using
 # `clinicadl train` and evaluate its performance with `clinicadl interpret`. For
 # more information on the following command lines please read the sections
 # [Classification with a CNN on 2D slice](./training_classification.ipynb) and
 # [Regression with 3D images](./training_regression.ipynb).
 #
-# The following command uses a pre-build architecture of ClinicaDL `Conv4_FC3`.
+# The following `clinicadl train` command uses a pre-build architecture of ClinicaDL `Conv4_FC3`.
 # You can also implement your own models by following the instructions of [this
 # section](./training_custom.ipynb).
 #
-# If you failed to generate a trivial dataset, please uncomment the next cell.
-# %%
-# !curl -k https://aramislab.paris.inria.fr/clinicadl/files/handbook_2023/data/synthetic.tar.gz -o synthetic.tar.gz
-# !tar xf synthetic.tar.gz
-# %%
-# Prepare data (extraction of image tensors)
+# First, we need to run `prepare-data` to extract the tensors from the images:
+# %% 
 !clinicadl prepare-data image data/synthetic t1-linear --extract_json extract_T1linear_image
+# %% [markdown]
+# Then, we will train the network with the synthetic data. If you failed to generate a trivial dataset, 
+# please uncomment the next cell.
+# %%
+# # !curl -k https://aramislab.paris.inria.fr/clinicadl/files/handbook_2023/data/synthetic.tar.gz -o synthetic.tar.gz
+# # !mkdir data
+# # !tar xf synthetic.tar.gz -C data
+# # !mkdir data/fake_bids
+# # !clinicadl tsvtools get-labels data/fake_bids data --missing_mods data/synthetic/missing_mods --merged_tsv data/synthetic/data.tsv --modality synthetic
+# # !clinicadl tsvtools split data/labels.tsv --n_test 0.25 --subset_name test
+# # !clinicadl tsvtools kfold data/split/train.tsv --n_splits 3
+# # no need to run prepare-data
 # %%
-# Train a network with synthetic data
-!clinicadl train classification data/synthetic extract_T1linear_image data/split/3_fold data/synthetic_maps --architecture Conv4_FC3 --n_splits 3 --split 0 
+# Train a network with synthetic data (remove --no-gpu option if you do have access to a gpu)
+!clinicadl train classification data/synthetic extract_T1linear_image data/split/3_fold data/synthetic_maps --architecture Conv4_FC3 --n_splits 3 --split 0 --no-gpu
 # %% [markdown]
 # As the number of images is very small (4 per class), we do not rely on the
 # accuracy to select the model. Instead we evaluate the model which obtained the
@@ -160,18 +165,19 @@
 # <img src="../images/generate_random.png" alt="generate random" style="height: 350px; margin: 10px; text-align: center;">
 
 # ```{warning}
-# You need to execute the `clinica run` and `clinicadl prepare-data` pipelines
-# prior to running this task.  Moreover, the random option can synthesize as
+# You need to execute the `clinica run` pipeline prior to running this task.  
+# Moreover, the random option can synthesize as
 # many images as wanted with only one input image.
 # ```
-# %% [markdown]
-# ###Running the task
+# ### Running the task
+#
 # ```bash
 # clinicadl generate random <caps_directory> <generated_caps_directory> 
 # ```
 # where:
 
-# - `caps_directory` is the output folder containing the results in a [CAPS](http://www.clinica.run/doc/CAPS/) hierarchy.
+# - `caps_directory` is the output folder containing the results of `clinica run` in a
+# [CAPS](https://aramislab.paris.inria.fr/clinica/docs/public/latest/CAPS/Introduction/) hierarchy,
 # - `generated_caps_directory` is the folder where the synthetic CAPS is stored.
 
 
@@ -196,7 +202,7 @@
 # - **subtype 1**: Top region has its maximum size but Bottom is atrophied, 
 # - **subtype 2**: Bottom region has its maximum size but Top is atrophied.
 
-# <img src="../images/generate_shepplogan.png" alt="generate shepplogan" style="height: 350px; margin: 10px; text-align: center;">
+# <img src="../images/generate_shepplogan.png" alt="generate shepplogan" style="height: 250px; margin: 5px; text-align: center;">
 
 # These three subtypes are spread between two labels which mimic the binary
 # classification between Alzheimer's disease patients (AD) with heterogeneous