From 8c95f017eedbda9e45371bc48913cacec9134d02 Mon Sep 17 00:00:00 2001 From: kostas-kou Date: Wed, 8 Jan 2025 12:13:36 +0100 Subject: [PATCH 1/4] Update the download instructions --- datasets/download/downloading-data.qmd | 48 +++++++++++++++++++------- 1 file changed, 35 insertions(+), 13 deletions(-) diff --git a/datasets/download/downloading-data.qmd b/datasets/download/downloading-data.qmd index a466d81..cd93d01 100644 --- a/datasets/download/downloading-data.qmd +++ b/datasets/download/downloading-data.qmd @@ -18,7 +18,7 @@ Follow the dialogue to get authenticated and then click on `Download inbox s3cmd ### Install the sda-cli tool Follow the guidelines [here](../submission/submission-guide.qmd#install-the-sda-cli-tool) to install the sda-cli tool. -### Generate the public and secret key (**ONLY FOR ENCRYPTED DATA**) +### Generate the public and secret key The initial step involves creating a crypt4gh keypair using the sda-cli: ```bash @@ -26,28 +26,50 @@ The initial step involves creating a crypt4gh keypair using the sda-cli: ``` where `` is the base name of the key files. The above command will create two key files named -`keypair_name.pub.pem` and `keypair_name.sec.pem`. The public key (`pub`) will be sent to the admins +`keypair_name.pub.pem` and `keypair_name.sec.pem`. The public key (`pub`) will be used alongside with `sda-cli` and will be used by the system for encryption of the files before downloading, while the private one (`sec`) will be used by the requester for decrypting the files after downloading. ## Check access After the user has been granted access to the dataset, the user can check access to the dataset by listing the datasets and their files using the `sda-cli`. -This can be done by following the instructions in the `List datasets and their files` section of the `sda-cli` documentation [here](https://github.com/NBISweden/sda-cli#list-datasets-and-their-files). +For listing the datasets that the user has access to in production environment, the user needs to run: + +```bash +./sda-cli list -config s3cmd.conf --datasets --url https://download.bp.nbis.se (--bytes) +``` + +For listing the files of a specific dataset in production environment, the user needs to run: + +```bash +./sda-cli list -config s3cmd.conf -dataset --url https://download.bp.nbis.se (--bytes) +``` + +where `` is the ID of the dataset that the user wants to list the files of. The dataset ID can be found by running the first command. ## Download data -After having acquired access to the datasets, the configuration file and the sda-cli tool, the user can download either decrypted data (in several ways) or encrypted data: +After having acquired access to the datasets, the configuration file and the sda-cli tool, the user can download the encrypted data. +The user needs to provide the public key that was generated earlier and the configuration file to download the data: -- Decrypted files: - - using filepaths - - using file IDs - - recursively downloading all files in a folder - - downloading files by providing a text file with the paths of the files to download - - download all the files of the dataset +```bash +./sda-cli download -config s3cmd.conf -pubkey -dataset-id --url https://download.bp.nbis.se -outdir ... +``` + +where: +- `` is the public key file that was generated earlier (.pub.pem) +- `` is the ID of the dataset that the user wants to download the files of +- `` is the path to the directory where the files will be downloaded +- `` are the file paths of the files which can be found by listing the files of the dataset as described above + +## Decrypt the data + +After downloading the encrypted data, the user can decrypt the files using the private key that was generated earlier by running: + +```bash +./sda-cli decrypt -key .sec.pem +``` -- Encrypted files: - - download specific encrypted files +where `` is the path to the encrypted file that the user wants to decrypt and `.sec.pem` is the private key file that was generated earlier. -All the information for downloading files can be found in the sda-cli documentation in the `Download` section [here](https://github.com/NBISweden/sda-cli#download). From ca1d652fc467cebb9271a20cdda4c22c9836f9ad Mon Sep 17 00:00:00 2001 From: Kostas Koumpouras <47719735+kostas-kou@users.noreply.github.com> Date: Wed, 8 Jan 2025 13:50:28 +0100 Subject: [PATCH 2/4] Apply suggestions from code review Co-authored-by: Joakim Bygdell --- datasets/download/downloading-data.qmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/datasets/download/downloading-data.qmd b/datasets/download/downloading-data.qmd index cd93d01..31d920f 100644 --- a/datasets/download/downloading-data.qmd +++ b/datasets/download/downloading-data.qmd @@ -33,13 +33,13 @@ and will be used by the system for encryption of the files before downloading, w ## Check access After the user has been granted access to the dataset, the user can check access to the dataset by listing the datasets and their files using the `sda-cli`. -For listing the datasets that the user has access to in production environment, the user needs to run: +For listing the datasets that the user has access to, the user needs to run: ```bash ./sda-cli list -config s3cmd.conf --datasets --url https://download.bp.nbis.se (--bytes) ``` -For listing the files of a specific dataset in production environment, the user needs to run: +For listing the files of a specific dataset, the user needs to run: ```bash ./sda-cli list -config s3cmd.conf -dataset --url https://download.bp.nbis.se (--bytes) From 0c71b20eb3050d0a9a5130713b8e4d126a131070 Mon Sep 17 00:00:00 2001 From: Kostas Koumpouras <47719735+kostas-kou@users.noreply.github.com> Date: Wed, 8 Jan 2025 13:55:14 +0100 Subject: [PATCH 3/4] Apply suggestions from code review Co-authored-by: Nanjiang Shu --- datasets/download/downloading-data.qmd | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/datasets/download/downloading-data.qmd b/datasets/download/downloading-data.qmd index 31d920f..25ba52a 100644 --- a/datasets/download/downloading-data.qmd +++ b/datasets/download/downloading-data.qmd @@ -45,12 +45,14 @@ For listing the files of a specific dataset, the user needs to run: ./sda-cli list -config s3cmd.conf -dataset --url https://download.bp.nbis.se (--bytes) ``` -where `` is the ID of the dataset that the user wants to list the files of. The dataset ID can be found by running the first command. +where `` is the ID of the dataset for which the user wants to list the files. The dataset ID can be found by running the previous command. ## Download data After having acquired access to the datasets, the configuration file and the sda-cli tool, the user can download the encrypted data. -The user needs to provide the public key that was generated earlier and the configuration file to download the data: +The user needs to provide the public key that was generated earlier, as well as the configuration file. + +To download the data: ```bash ./sda-cli download -config s3cmd.conf -pubkey -dataset-id --url https://download.bp.nbis.se -outdir ... @@ -58,7 +60,7 @@ The user needs to provide the public key that was generated earlier and the conf where: - `` is the public key file that was generated earlier (.pub.pem) -- `` is the ID of the dataset that the user wants to download the files of +- `` is the ID of the dataset for which the user wants to download the files - `` is the path to the directory where the files will be downloaded - `` are the file paths of the files which can be found by listing the files of the dataset as described above From d8ce65df57e1055e79731fbc7abdc7d467f2eac2 Mon Sep 17 00:00:00 2001 From: Kostas Koumpouras <47719735+kostas-kou@users.noreply.github.com> Date: Wed, 8 Jan 2025 13:55:34 +0100 Subject: [PATCH 4/4] Update datasets/download/downloading-data.qmd Co-authored-by: Malin Klang --- datasets/download/downloading-data.qmd | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/datasets/download/downloading-data.qmd b/datasets/download/downloading-data.qmd index 25ba52a..32ac2ab 100644 --- a/datasets/download/downloading-data.qmd +++ b/datasets/download/downloading-data.qmd @@ -59,7 +59,8 @@ To download the data: ``` where: -- `` is the public key file that was generated earlier (.pub.pem) + +- `` is the public key file that was generated earlier (`.pub.pem`) - `` is the ID of the dataset for which the user wants to download the files - `` is the path to the directory where the files will be downloaded - `` are the file paths of the files which can be found by listing the files of the dataset as described above