diff --git a/README.md b/README.md index 628beb3..3b6d82c 100644 --- a/README.md +++ b/README.md @@ -1,196 +1,289 @@ -dx-streaming-upload -========= - -[![Build Status](https://travis-ci.org/dnanexus-rnd/dx-streaming-upload.svg?branch=master)](https://travis-ci.org/dnanexus-rnd/dx-streaming-upload) - -The dx-streaming-upload Ansible role packages the streaming upload module for increamentally uploading a RUN directory from an Illumina sequencer onto the DNAnexus platform. - -Instruments that this module support include the Illumina MiSeq, NextSeq, HiSeq-2500, HiSeq-4000 and HiSeq-X. - -Role Variables --------------- -- `mode`: `{deploy, debug}` In the *debug* mode, monitoring cron job is triggered every minute; in *deploy mode*, monitoring cron job is triggered every hour. -- `upload_project`: ID of the DNAnexus project that the RUN folders should be uploaded to. The ID is of the form `project-BpyQyjj0Y7V0Gbg7g52Pqf8q` -- `dx_token`: API token for the DNAnexus user to be used for data upload. The API token should give minimally UPLOAD access to the `{{ upload project }}`, or CONTRIBUTE access if `downstream_applet` is specified. Instructions for generating a API token can be found at [DNAnexus wiki](https://wiki.dnanexus.com/UI/API-Tokens). This value is overriden by `dx_user_token` in `monitored_users`. -- `monitored_users`: This is a list of objects, each representing a remote user, with its set of incremental upload parameters. For each `monitored_user`, the following values are accepted - - `username`: (Required) username of the remote user - - `monitored_directories`: (Required) Path to the local directory that should be monitored for RUN folders. Multiple directories can be listed. Suppose that the folder `20160101_M000001_0001_000000000-ABCDE` is the RUN directory, then the folder structure assumed is `{{monitored_dir}}/20160101_M000001_0001_000000000-ABCDE` - - `local_tar_directory`: (Optional) Path to a local folder where tarballs of RUN directory is temporarily stored. User specified in `username` need to have **WRITE** access to this folder. There should be sufficient disk space to accomodate a RUN directory in this location. This overwrites the default found in `templates/monitor_run_config.template`. - - `local_log_directory`: (Optional) Path to a local folder where logs of streaming upload is stored, persistently. User specified in `username` need to have **WRITE** access to this folder. User should not manually manipulate files found in this folder, as the streaming upload code make assumptions that the files in this folder are not manually manipulated. This overwites the default found in `templates/monitor_run_config.template`. - - `run_length`: (Optional) Expected duration of a sequencing run, corresponds to the -D paramter in incremental upload (For example, 24h). Acceptable suffix: s, m, h, d, w, M, y. - - `n_seq_intervals`: (Optional) Number of intervals to wait for run to complete. If the sequencing run has not completed within `n_seq_intervals` * `run_length`, it will be deemed as aborted and the program will not attempt to upload it. Corresponds to the -I parameter in incremental upload. - - `n_upload_threads`: (Optional) Number of upload threads used by Upload Agent. For sites with severe upload bandwidth limitations (<100kb/s), it is advised to reduce this to 1, to increase robustness of upload in face of possible network disruptions. Default=8. - - `script`: (Optional) File path to an executable script to be triggered after successful upload for the RUN directory. The script must be executable by the user specified by `username`. The script will be triggered in the with a single command line argument, correpsonding to the filepath of the RUN directory (see section *Example Script*). **If the file path to the script given does not point to a file, or if the file is not executable by the user, then the upload process will not commence.** - - `dx_user_token`: (Optional) API token associated with the specific `monitored_user`. This overrides the value `dx_token`. If `dx_user_token` is not specified, defaults to `dx_token`. - - `applet`: (Optional) ID of a DNAnexus applet to be triggered after successful upload of the RUN directory. This applet's I/O contract should accept a DNAnexus record with the name `upload_sentinel_record` as input. This applet will be triggered with only the `upload_sentinel_record` input. Additional input can be specified using the variable `downstream_input`. **Note that if the specified applet is not located, the upload process will not commence. Mutually exclusive with `workflow`. The role will raise an error and fail if both are specified.** - - `workflow`: (Optional) ID of a DNAnexus workflow to be triggered after successful upload of the RUN directory. This workflow's I/O contract should accept a DNAnexus record with the name `upload_sentinel_record` in the 1st stage (stage 0) of the workflow as input. Additional input can be specified using the variable `downstream_input`. **Note that if the specified workflow is not located, the upload process will not commence. Mutually exclusive with `applet`. The role will raise an error and fail if both are specified.** - - `downstream_input`: (Optional) A JSON string, parsable as a python `dict` of `str`:``str`, where the **key** is the input_name recognized by a DNAnexus applet/workflow and the **value** is the corresponding input. For examples and detailed explanation, see section titled `Downstream analysis`. **Note that the role will raise an error and fail if this string is not JSON-parsable as a dict of the expected format** - -**Note** DNAnexus login is persistent and the login environment is stored on disk in the the Ansible user's home directory. User of this playbook responsibility to make sure that every Ansible user (`monitored_user`) with a streaming upload job assigned has been logged into DNAnexus by either specifying a `dx_token` or `dx_user_token`. - -Dependencies ------------- -Python 2.7 is needed. This program is not compatible with Python 3.X. - -Minimal Ansible version: 2.0. - -This program is intended for Ubuntu 14.04 (Trusty) and has been tested on the 15.10 (Wily) release. Most features should work on a Ubuntu 12.04 (Precise) system, but this has not been tested to date. - - -Requirements ------------- -Users of this module needs a DNAnexus account and its accompanying authentication. To register for a trial account, visit the [DNAnexus homepage](https://dnanexus.com). - -More information and tutorials about the DNAnexus platform can be found at the [DNAnexus wiki page](https://wiki.dnanexus.com). - -The `remote-user` that the role is run against must possess **READ** access to `monitored_folder` and **WRITE** access to disk for logging and temporary storage of tar files. These are typically stored under the `remote-user's` home directory, and is specified in the file `monitor_run_config.template` or as given explicitly by the variables `local_tar_directory` and `local_log_directory`. - -The machine that this role is deployed to should have at least 500Mb of free RAM available for allocation by the upload module during the time of upload. - -Example Playbook ----------------- -`dx-upload-play.yml` -```YAML ---- -- hosts: localhost - vars: - monitored_users: - - username: travis - local_tar_directory: ~/new_location/upload/TMP - local_log_directory: ~/another_location/upload/LOG - monitored_directories: - - ~/runs - applet: applet-Bq2Kkgj08FqbjV3J8xJ0K3gG - downstream_input: '{"sequencing_center": "CENTER_A"}' - - username: root - monitored_directories: - - ~/home/root/runs - workflow: workflow-BvFz31j0Y7V5QPf09x9y91pF - downstream_input: '{"0.sequencing_center: "CENTER_A"}' - mode: debug - upload_project: project-BpyQyjj0Y7V0Gbg7g52Pqf8q - - roles: - - dx-streaming-upload - -``` - -**Note**: For security reasons, you should refrain from storing the DNAnexus authentication token in a playbook that is open-access. One might trigger the playbook on the command line with extra-vars to supply the necessary authentication token, or store them in a closed-source yaml variable file. - -ie. `ansible-playbook dx-upload-play.yml -i inventory --extra-vars "dx_token="` - -We recommend that the token given is limited in scope to the upload project, and has no higher than **CONTRIBUTE** privileges. - -Example Script --------------- -The following is an example script that writes a flat file to the RUN directory once a RUN directory has been successfully streamed. - -Recall that the script will be triggered with a single command line parameter, where `$1` is the path to the local RUN directory that has been successfully streamed to DNAnexus. - -``` -#!/bin/bash - -set -e -x -o pipefail - -rundir="$1" -echo "Completed streaming run directory: $rundir" > "$rundir/COMPLETE.txt" -``` - -Actions performed by Role -------------------------- -The dx-streaming-upload role perform, broadly, the following: - -1. Installs the DNAnexus tools [dx-toolkit](https://wiki.dnanexus.com/Downloads#DNAnexus-Platform-SDK) and [upload agent](https://wiki.dnanexus.com/Downloads#Upload-Agent) on the remote machine. -2. Set up a CRON job that monitors a given directory for RUN directories periodically, and streams the RUN directory into a DNAnexus project, triggering an app(let)/workflow upon successful upload of the directory and a local script (when specified by user) - -Downstream analysis -------------------- -The dx-streaming-upload role can optionally trigger a DNAnexus applet/workflow upon completion of incremental upload. The desired DNAnexus applet or workflow can be specified (at a per `monitored_user` basis) using the Ansible variables `applet` or `workflow` respectively (mutually exclusive, see explanantion of variables for general explanations). - -More information about DNAnexus workflows can be found at the [DNAnexus wiki page](https://wiki.dnanexus.com/API-Specification-v1.0.0/Running-Analyses) - -### Authorization -The downstream analysis (applet or workflow) will be launched in the project into which the RUN directory is uploaded to (`project`). The DNAnexus user / associated `dx_token` or `dx_user_token` must have at least `CONTRIBUTE` access to the aforementioned project for the analysis to be launched successfully. Computational resources are billable and will be billed to the bill-to of the corresponding project. - -### Input and Options -The specified applet/workflow will be triggered using the `run` [API](http://autodoc.dnanexus.com/bindings/python/current/dxpy_apps.html?highlight=applet%20run#dxpy.bindings.dxapplet.DXExecutable.run) in the dxpy tool suite. - -For an applet, the `executable_input` hash to the `run` command will be prepopulated with the key-value pair {"`upload_sentinel_record`": `$record_id`} where `$record_id` is the DNAnexus file-id of the sentinel record generated for the uploaded RUN directory (see section titled **Files generated**). - -For a workflow the `executable_input` hash will be prepoluated with the key-value pair {"`0.upload_sentinel_record`": `$record_id`} where `$record_id` is the DNAnexus file-id of the sentinel record generated for the uploaded RUN directory (see section titled **Files generated**). - -**It is the user's responsibility to ensure that the specified applet/workflow has an appropriate input contract which accepts a DNAnexus record with the input name of `upload_sentinel_record`** - -Additional input/options can be specified, statically using the Ansible variable `downstream_input`. This should be provided as a JSON string, parsable, at the top level, as a Python dict of `str` to `str`. - -Example of a properly formatted `downstream_input` for an `applet` -- ```{"input_name1": "value1", "input_name2": "value2"}``` - -Example of a properly formatted `downstream_input` for a `workflow` -- ```{"0.step0_input": "value1", "1.step2_input": "value2"})``` - -*Note the numerical index prefix necessary when specifying input for an `workflow`, which disambiguates which step in the workflow an input is targeted to* - -Files generated ----------------- -We use a hypothetical example of a local RUN folder named `20160101_M000001_0001_000000000-ABCDE`, that was placed into the `monitored_directory`, after the `dx-streaming-upload` role has been set up. - -**Local Files Generated** -``` -path/to/LOG/directory -(specified in monitor_run_config.template file) -- 20160101_M000001_0001_000000000-ABCDE.lane.all.log - -path/to/TMP/directory -(specified in monitor_run_config.template file) -- no persistent files (tar files stored transiently, deleted upon successful upload to DNAnexus) -``` - -**Files Streamed to DNAnexus project** -``` -project - └───20160101_M000001_0001_000000000-ABCDE - │───runs - │ │ RunInfo.xml - │ │ SampleSheet.csv - │ │ run.20160101_M000001_0001_000000000-ABCDE.lane.all.log - │ │ run.20160101_M000001_0001_000000000-ABCDE.lane.all.upload_sentinel - │ │ run.20160101_M000001_0001_000000000-ABCDE.lane.all_000.tar.gz - │ │ run.20160101_M000001_0001_000000000-ABCDE.lane.all_001.tar.gz - │ │ ... - │ - └───reads (or analyses) - │ output files from downstream applet (e.g. demx) - │ "reads" folder will be created if an applet is triggered - │ "analyses" folder will be created if a workflow is triggered - │ ... -``` - -The `reads` folder (and subfolders) will only be created if `applet` is specified. -The `analyses` folder (and subfolder) will only be created if `workflow` is specified. - -`RunInfo.xml` and `SampleSheet.csv` will only be upladed if they can be located within the root of the local RUN directory. - -Logging, Notification and Error Handling ------------------------------------------- -**Uploading** - -A log of the CRON command (executed with `bash -e`) is written to the user's home folder `~/dx-stream_cron.log` and can be used to check the top level command triggered. - -The verbose log of the upload process (generated by the top-level `monitor_runs.py`) is written to the user's home folder `~/monitor.log`. - -These logs can be used to diagnose failures of upload from the local machine to DNAnexus. - -**Downstream applet** - -The downstream applet will be run in the project that the RUN directory is uploaded to (as specified in role variable `upload_project`). Users can log in to their DNAnexus account (corresponding to the `dx_token` or `dx_user_token`) and navigate to the upload project to monitor the progress of the applet triggered. Typically, on failure of a DNAnexus job, the user will receive a notification email, which will direct the user to check the log of the failed job for further diagnosis and debugging. - -License -------- - -Apache - -Author Information ------------------- - -DNAnexus (email: support@dnanexus.com) + +[![Build Status](https://travis-ci.org/dnanexus-rnd/dx-streaming-upload.svg?branch=master)](https://travis-ci.org/dnanexus-rnd/dx-streaming-upload) + +dx-streaming-upload +=================== + +The dx-streaming-upload Ansible role packages the streaming upload module for increamentally uploading a RUN directory from an Illumina sequencer onto the DNAnexus platform. + +Instruments that this module support include the Illumina MiSeq, NextSeq, HiSeq-2500, HiSeq-4000, HiSeq-X and NovaSeq. + + +## Table of Contents +1. [Dependencies](#dependencies) +2. [Requirements](#requirements) +3. [Installation](#installation) +4. [Examples](#examples) +5. [Example workflows](#example-workflows) +6. [Troubleshooting](#troubleshooting) + +## Dependencies + +Python 2.7 is needed. This program is not compatible with Python 3.X. + +Minimal Ansible version: 2.0. + +This program is intended for Ubuntu 14.04 and 16.04, and has been tested on Red Hat 7.4/7.5 and OLE (Oracle Linux Enterprise) 7. It has not been tested on any other versions but it should work with most of the Linux OS releases. + +## Requirements + +Users of this module needs a DNAnexus account and its accompanying authentication. To register for a trial account, visit the [DNAnexus homepage](https://platform.dnanexus.com/register). + +More information and tutorials about the DNAnexus platform can be found at the [DNAnexus wiki page](https://wiki.dnanexus.com/Home). + +The local user utilizing this package should possess READ access to monitored_folder and WRITE access to disk for logging and temporary storage of tar files. These are typically stored under the local user's home directory, and is specified in the file monitor_run_config.template or as given explicitly by the variables local_tar_directory and local_log_directory. + +The machine that this role is deployed to should have sufficient free memory depending on the throughput of the sequencing instrument. For Novaseq and HiSeqs we recommend a machine with atleast 8 cores, 32 GB of RAM, and 500GB - 1TB of storage. + +## Installation +#### Using Ubuntu (tested on 14.04/16.04) +Create a working directory. Select the /opt folder as working directory (in our case we are using use ~/dx) +``` +mkdir ~/dx +cd ~/dx +``` +Install prerequisites +``` +sudo apt-get install git +sudo apt-get install wget +``` +Enable universe repositories +``` +sudo apt-get install software-properties-common +sudo apt-add-repository universe +sudo apt-get update +``` +Install pip and some essential packages +``` +curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py +sudo apt install python2.7 +sudo cp /usr/bin/python2.7 /usr/bin/python +sudo python get-pip.py +sudo pip install -U setuptools +sudo pip install packaging +make +sudo apt-get install build-essential -y +Install ansible +git clone https://github.com/ansible/ansible.git +cd ansible/ +make +sudo make install +``` +Download or move some test sequencing data in `/opt/seq` folder + +Clone streaming repo +``` +cd ~/dx +git clone https://github.com/dnanexus-rnd/dx-streaming-upload.git +``` +Create dx-upload-play.yml file inside the dx-streaming-folder. +###dx-upload-play.yml +`dx-upload-play.yml` +```YAML +--- +- hosts: localhost + vars: + monitored_users: + - username: root + local_tar_directory: /opt/tmp + local_log_directory: /opt/log + monitored_directories: + - /opt/seq + dx_user_token: + mode: debug + upload_project: project-id + roles: + - dx-streaming-upload +``` +Here are the instructions for token generation. +Launch the ansible-playbook +``` +sudo ansible-playbook dx-streaming-upload/dx-upload-play.yml +``` +Give the right permission to cron +``` +sudo cron +``` +#### Using RedHat (tested on 7.4 and 7.5) +Create a working directory. Select the /opt folder as working directory (in our case we are using use ~/dx) +``` +mkdir ~/dx +cd ~/dx +``` +Install prerequisites +``` +sudo yum install git -y +sudo yum install wget -y +``` +Enable EPEL Repository for RH 7.* +``` +wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm +sudo rpm -ivh epel-release-latest-7.noarch.rpm +Install pip and some essential packages +sudo yum install python-pip -y +sudo cp /usr/bin/python2.7 /usr/bin/python +sudo pip install -U setuptools +sudo pip install packaging +make +sudo yum install gcc gcc-c++ kernel-devel -y +``` +Install ansible +``` +git clone https://github.com/ansible/ansible.git +cd ansible/ +make +sudo make install +``` +Download or move some test sequencing data in `/opt/seq` folder + +Clone streaming repository +``` +cd ~/dx +git clone https://github.com/dnanexus-rnd/dx-streaming-upload.git +``` +Create dx-upload-play.yml file inside the dx-streaming-folder. Example given [here](#dx-upload-play.yml) +Here are the [instructions](https://wiki.dnanexus.com/Command-Line-Client/Login-and-Logout#Generating-an-authentication-token) for token generation. +Launch the ansible-playbook +``` +sudo ansible-playbook dx-streaming-upload/dx-upload-play.yml +``` +## Examples +#### Role Variables +- `mode`: `{deploy, debug}` In the *debug* mode, monitoring cron job is triggered every minute; in *deploy mode*, monitoring cron job is triggered every hour. +- `upload_project`: ID of the DNAnexus project that the RUN folders should be uploaded to. The ID is of the form `project-BpyQyjj0Y7V0Gbg7g52Pqf8q` +- `dx_token`: API token for the DNAnexus user to be used for data upload. The API token should give minimally UPLOAD access to the `{{ upload project }}`, or CONTRIBUTE access if `downstream_applet` is specified. Instructions for generating a API token can be found at [DNAnexus wiki](https://wiki.dnanexus.com/UI/API-Tokens). This value is overriden by `dx_user_token` in `monitored_users`. +- `monitored_users`: This is a list of objects, each representing a local user, with its set of incremental upload parameters. For each `monitored_user`, the following values are accepted + - `username`: (Required) username of the local user + - `monitored_directories`: (Required) Path to the local directory that should be monitored for RUN folders. Multiple directories can be listed. Suppose that the folder `20160101_M000001_0001_000000000-ABCDE` is the RUN directory, then the folder structure assumed is `{{monitored_dir}}/20160101_M000001_0001_000000000-ABCDE` + - `local_tar_directory`: (Optional) Path to a local folder where tarballs of RUN directory is temporarily stored. User specified in `username` need to have **WRITE** access to this folder. There should be sufficient disk space to accomodate a RUN directory in this location. This overwrites the default found in `templates/monitor_run_config.template`. + - `local_log_directory`: (Optional) Path to a local folder where logs of streaming upload is stored, persistently. User specified in `username` need to have **WRITE** access to this folder. User should not manually manipulate files found in this folder, as the streaming upload code make assumptions that the files in this folder are not manually manipulated. This overwites the default found in `templates/monitor_run_config.template`. + - `run_length`: (Optional) Expected duration of a sequencing run, corresponds to the -D paramter in incremental upload (For example, 24h). Acceptable suffix: s, m, h, d, w, M, y. + - `n_seq_intervals`: (Optional) Number of intervals to wait for run to complete. If the sequencing run has not completed within `n_seq_intervals` * `run_length`, it will be deemed as aborted and the program will not attempt to upload it. Corresponds to the -I parameter in incremental upload. + - `n_upload_threads`: (Optional) Number of upload threads used by Upload Agent. For sites with severe upload bandwidth limitations (<100kb/s), it is advised to reduce this to 1, to increase robustness of upload in face of possible network disruptions. Default=8. + - `script`: (Optional) File path to an executable script to be triggered after successful upload for the RUN directory. The script must be executable by the user specified by `username`. The script will be triggered in the with a single command line argument, correpsonding to the filepath of the RUN directory (see section *Example Script*). **If the file path to the script given does not point to a file, or if the file is not executable by the user, then the upload process will not commence.** + - `dx_user_token`: (Optional) API token associated with the specific `monitored_user`. This overrides the value `dx_token`. If `dx_user_token` is not specified, defaults to `dx_token`. + - `applet`: (Optional) ID of a DNAnexus applet to be triggered after successful upload of the RUN directory. This applet's I/O contract should accept a DNAnexus record with the name `upload_sentinel_record` as input. This applet will be triggered with only the `upload_sentinel_record` input. Additional input can be specified using the variable `downstream_input`. **Note that if the specified applet is not located, the upload process will not commence. Mutually exclusive with `workflow`. The role will raise an error and fail if both are specified.** + - `workflow`: (Optional) ID of a DNAnexus workflow to be triggered after successful upload of the RUN directory. This workflow's I/O contract should accept a DNAnexus record with the name `upload_sentinel_record` in the 1st stage (stage 0) of the workflow as input. Additional input can be specified using the variable `downstream_input`. **Note that if the specified workflow is not located, the upload process will not commence. Mutually exclusive with `applet`. The role will raise an error and fail if both are specified.** + - `downstream_input`: (Optional) A JSON string, parsable as a python `dict` of `str`:``str`, where the **key** is the input_name recognized by a DNAnexus applet/workflow and the **value** is the corresponding input. For examples and detailed explanation, see section titled `Downstream analysis`. **Note that the role will raise an error and fail if this string is not JSON-parsable as a dict of the expected format** +**Note** DNAnexus login is persistent and the login environment is stored on disk in the the Ansible user's home directory. User of this playbook responsibility to make sure that every Ansible user (`monitored_user`) with a streaming upload job assigned has been logged into DNAnexus by either specifying a `dx_token` or `dx_user_token`. +#### Example Playbook +`dx-upload-play.yml` +```YAML +--- +- hosts: localhost + vars: + monitored_users: + - username: travis + local_tar_directory: ~/new_location/upload/TMP + local_log_directory: ~/another_location/upload/LOG + monitored_directories: + - ~/runs + applet: applet-Bq2Kkgj08FqbjV3J8xJ0K3gG + downstream_input: '{"sequencing_center": "CENTER_A"}' + - username: root + monitored_directories: + - ~/home/root/runs + workflow: workflow-BvFz31j0Y7V5QPf09x9y91pF + downstream_input: '{"0.sequencing_center: "CENTER_A"}' + mode: debug + upload_project: project-BpyQyjj0Y7V0Gbg7g52Pqf8q + roles: + - dx-streaming-upload +``` +**Note**: For security reasons, you should refrain from storing the DNAnexus authentication token in a playbook that is open-access. One might trigger the playbook on the command line with extra-vars to supply the necessary authentication token, or store them in a closed-source yaml variable file. +ie. `ansible-playbook dx-upload-play.yml -i inventory --extra-vars "dx_token="` +We recommend that the token given is limited in scope to the upload project, and has no higher than **CONTRIBUTE** privileges. +#### Example Script +The following is an example script that writes a flat file to the RUN directory once a RUN directory has been successfully streamed. +Recall that the script will be triggered with a single command line parameter, where `$1` is the path to the local RUN directory that has been successfully streamed to DNAnexus. +``` +#!/bin/bash +set -e -x -o pipefail +rundir="$1" +echo "Completed streaming run directory: $rundir" > "$rundir/COMPLETE.txt" +``` +#### Actions performed by Role +The dx-streaming-upload role perform, broadly, the following: +1. Installs the DNAnexus tools [dx-toolkit](https://wiki.dnanexus.com/Downloads#DNAnexus-Platform-SDK) and [upload agent](https://wiki.dnanexus.com/Downloads#Upload-Agent) on the local machine. +2. Set up a CRON job that monitors a given directory for RUN directories periodically, and streams the RUN directory into a DNAnexus project, triggering an app(let)/workflow upon successful upload of the directory and a local script (when specified by user) +#### Downstream analysis +The dx-streaming-upload role can optionally trigger a DNAnexus applet/workflow upon completion of incremental upload. The desired DNAnexus applet or workflow can be specified (at a per `monitored_user` basis) using the Ansible variables `applet` or `workflow` respectively (mutually exclusive, see explanantion of variables for general explanations). +More information about DNAnexus workflows can be found at the [DNAnexus wiki page](https://wiki.dnanexus.com/API-Specification-v1.0.0/Running-Analyses) +### Authorization +The downstream analysis (applet or workflow) will be launched in the project into which the RUN directory is uploaded to (`project`). The DNAnexus user / associated `dx_token` or `dx_user_token` must have at least `CONTRIBUTE` access to the aforementioned project for the analysis to be launched successfully. Computational resources are billable and will be billed to the bill-to of the corresponding project. +### Input and Options +The specified applet/workflow will be triggered using the `run` [API](http://autodoc.dnanexus.com/bindings/python/current/dxpy_apps.html?highlight=applet%20run#dxpy.bindings.dxapplet.DXExecutable.run) in the dxpy tool suite. +For an applet, the `executable_input` hash to the `run` command will be prepopulated with the key-value pair {"`upload_sentinel_record`": `$record_id`} where `$record_id` is the DNAnexus file-id of the sentinel record generated for the uploaded RUN directory (see section titled **Files generated**). +For a workflow the `executable_input` hash will be prepoluated with the key-value pair {"`0.upload_sentinel_record`": `$record_id`} where `$record_id` is the DNAnexus file-id of the sentinel record generated for the uploaded RUN directory (see section titled **Files generated**). +**It is the user's responsibility to ensure that the specified applet/workflow has an appropriate input contract which accepts a DNAnexus record with the input name of `upload_sentinel_record`** +Additional input/options can be specified, statically using the Ansible variable `downstream_input`. This should be provided as a JSON string, parsable, at the top level, as a Python dict of `str` to `str`. +Example of a properly formatted `downstream_input` for an `applet` +- ```{"input_name1": "value1", "input_name2": "value2"}``` +Example of a properly formatted `downstream_input` for a `workflow` +- ```{"0.step0_input": "value1", "1.step2_input": "value2"})``` +*Note the numerical index prefix necessary when specifying input for an `workflow`, which disambiguates which step in the workflow an input is targeted to* +#### Files generated +We use a hypothetical example of a local RUN folder named `20160101_M000001_0001_000000000-ABCDE`, that was placed into the `monitored_directory`, after the `dx-streaming-upload` role has been set up. +**Local Files Generated** +``` +path/to/LOG/directory +(specified in monitor_run_config.template file) +- 20160101_M000001_0001_000000000-ABCDE.lane.all.log +path/to/TMP/directory +(specified in monitor_run_config.template file) +- no persistent files (tar files stored transiently, deleted upon successful upload to DNAnexus) +``` +**Files Streamed to DNAnexus project** +``` +project + └───20160101_M000001_0001_000000000-ABCDE + │───runs + │ │ RunInfo.xml + │ │ SampleSheet.csv + │ │ run.20160101_M000001_0001_000000000-ABCDE.lane.all.log + │ │ run.20160101_M000001_0001_000000000-ABCDE.lane.all.upload_sentinel + │ │ run.20160101_M000001_0001_000000000-ABCDE.lane.all_000.tar.gz + │ │ run.20160101_M000001_0001_000000000-ABCDE.lane.all_001.tar.gz + │ │ ... + │ + └───reads (or analyses) + │ output files from downstream applet (e.g. demx) + │ "reads" folder will be created if an applet is triggered + │ "analyses" folder will be created if a workflow is triggered + │ ... +``` +The `reads` folder (and subfolders) will only be created if `applet` is specified. +The `analyses` folder (and subfolder) will only be created if `workflow` is specified. +`RunInfo.xml` and `SampleSheet.csv` will only be upladed if they can be located within the root of the local RUN directory. +#### Logging, Notification and Error Handling +**Uploading** +A log of the CRON command (executed with `bash -e`) is written to the user's home folder `~/dx-stream_cron.log` and can be used to check the top level command triggered. +The verbose log of the upload process (generated by the top-level `monitor_runs.py`) is written to the user's home folder `~/monitor.log`. +These logs can be used to diagnose failures of upload from the local machine to DNAnexus. +**Downstream applet** +The downstream applet will be run in the project that the RUN directory is uploaded to (as specified in role variable `upload_project`). Users can log in to their DNAnexus account (corresponding to the `dx_token` or `dx_user_token`) and navigate to the upload project to monitor the progress of the applet triggered. Typically, on failure of a DNAnexus job, the user will receive a notification email, which will direct the user to check the log of the failed job for further diagnosis and debugging. +## Troubleshooting +#### Quick upload test +You can run a quick test to see if you have the right configuration set up. Please install our Upload Agent from [here](https://wiki.dnanexus.com/Downloads#Upload-Agent) and run the command - +``` +./ua --test +``` +For more information, please refer to our documentation [here](https://wiki.dnanexus.com/Upload-Agent#Running-a-simple-diagnostic-test). +#### Check if cron job has initialized +You can check the status of the cron job by trying the following command - +``` +crontab -l +``` +This should provide an output such as - +``` +#Ansible: DNAnexus monitor runs (debug) +* * * * * flock -w 5 /var/lock/dnanexus_uploader.lock bash -ex -c 'source /opt/dx-toolkit/environment; PATH=/opt/dnanexus-upload-agent:$PATH; python /opt/dnanexus/scripts/monitor_runs.py -c ~/dnanexus/config/monitor_runs.config -p project-XXXXX -d /PROD/NGS_DATA/MY_ILLUMINA_MACHINE/DATE/NVSQ-RUN_ID -v > ~/monitor.log 2>&1' > ~/dx-stream_cron.log 2>&1 +``` +#### Check status of the upload +The upload process is logged using these files - +`~/dx-stream_cron.log` is the first log file to monitor to see if the appropriate scripts are being launched +`~/monitor.log` is log contains the additional information about the upload process +## License +Apache +## Author Information +DNAnexus (email: support@dnanexus.com) \ No newline at end of file