Add init script for papi backend #5342

tcibinan · 2020-01-09T15:44:50Z

Cromwell engine with Google Cloud backend provides support for so-called monitoring script that can be used to monitor virtual machine / container stats while cromwell task command is being executed. The monitoring script launches asynchronously right before the task command and ends right after the command has finished.

The monitoring scripts does help a lot in the monitoring processes but it cannot be used to add some common initialization for all cromwell tasks as long as it is launched asynchronously.

Nevertheless a possibility to have support for some common initialization logic for all cromwell tasks can be of help. For example, if most of the workflow tasks uses filesystem mounts then their initialization can be either specified in the beginning of each task or it can be specified in a single place, so-called initialization script.

The support for initialization script is inspired totally by the monitoring script and the implementation is pretty the same.

Initialization script can be specified using the init_script workflow option.

Along with the existing monitoring script the commit brings support for so-called init script. Init script can be used to add common logic for all the tasks in the submitting workflow.

tcibinan · 2020-01-20T09:22:15Z

@cjllanwarne Did you have any chance to look at the pull request? Probably you have some thoughts on the matter and I would be very glad to hear them.

ruchim · 2020-03-10T13:30:50Z

Hello @tcibinan -- can you explain the type of initialization tasks that you'd invoke using an init_script? It seems in order to use the enable_fuse flag -- some common initialization needs to take place? I'm trying to understand if this is necessary for all tasks that rely on Fuse.

tcibinan · 2020-03-10T20:41:29Z

Hello @ruchim! Thanks for looking into the issue.

The idea behind the init script was to reduce code duplication between all Cromwell tasks that use recently added enable_fuse flag as much as possible. Otherwise mounts have to be manually configured for each and every Cromwell task in order to take advantage of the fuse capabilities.

Definitely the decision either to use or not use such init script highly depends on a workflow. From my point of view if some of workflow tasks use fuse capabilities then most of them probably do the same. Therefore the usage of init script is not required but can be helpful in such cases.

As an example let's look at the following workflow. It just calculates number of files in some of the mounted directories.

count.wdl

version 1.0

workflow count {
    output {
        Int REFERENCES = references.NUMBER
        Int SAMPLES = samples.NUMBER
    }

    call references { }
    call samples { }
}

task references {
    output {
        Int NUMBER = read_int("number")
    }

    command <<<
        mkdir -p /mount-point
        mount 8.8.8.8:/data /mount-point
        ls -lh /mount-point/references/ | wc -l > number
    >>>
}

task samples {
    output {
        Int NUMBER = read_int("number")
    }

    command <<<
        mkdir -p /mount-point
        mount 8.8.8.8:/data /mount-point
        ls -lh /mount-point/samples/ | wc -l > number
    >>>
}

As long as we have some common initialization in both tasks we can extract it to the init script which will be executed right before each task command.

If we perform such optimization then we have to upload init_script.sh to google cloud and enable it in the workflow properties.

init_script.sh

mkdir -p /mount-point
mount 8.8.8.8:/data /mount-point

workflow_options.json

{
    "init_script": "gs://storage/init_script.sh"
}

count.wdl

version 1.0

workflow count {
    output {
        Int REFERENCES = references.NUMBER
        Int SAMPLES = samples.NUMBER
    }

    call references { }
    call samples { }
}

task references {
    output {
        Int NUMBER = read_int("number")
    }

    command <<<
        ls -lh /mount-point/references/ | wc -l > number
    >>>
}

task samples {
    output {
        Int NUMBER = read_int("number")
    }

    command <<<
        ls -lh /mount-point/samples/ | wc -l > number
    >>>
}

tcibinan added 2 commits January 9, 2020 18:41

Add init script for papi backend.

7dffd21

Along with the existing monitoring script the commit brings support for so-called init script. Init script can be used to add common logic for all the tasks in the submitting workflow.

Add documentation for init script feature.

faa76d4

cjllanwarne added the Community Contribution A pull request from the Cromwell open-source community label Jan 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add init script for papi backend #5342

Add init script for papi backend #5342

tcibinan commented Jan 9, 2020

tcibinan commented Jan 20, 2020

ruchim commented Mar 10, 2020

tcibinan commented Mar 10, 2020

Add init script for papi backend #5342

Are you sure you want to change the base?

Add init script for papi backend #5342

Conversation

tcibinan commented Jan 9, 2020

tcibinan commented Jan 20, 2020

ruchim commented Mar 10, 2020

tcibinan commented Mar 10, 2020