Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add init script for papi backend #5342

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

tcibinan
Copy link
Contributor

@tcibinan tcibinan commented Jan 9, 2020

Cromwell engine with Google Cloud backend provides support for so-called monitoring script that can be used to monitor virtual machine / container stats while cromwell task command is being executed. The monitoring script launches asynchronously right before the task command and ends right after the command has finished.

The monitoring scripts does help a lot in the monitoring processes but it cannot be used to add some common initialization for all cromwell tasks as long as it is launched asynchronously.

Nevertheless a possibility to have support for some common initialization logic for all cromwell tasks can be of help. For example, if most of the workflow tasks uses filesystem mounts then their initialization can be either specified in the beginning of each task or it can be specified in a single place, so-called initialization script.

The support for initialization script is inspired totally by the monitoring script and the implementation is pretty the same.

Initialization script can be specified using the init_script workflow option.

Along with the existing monitoring script the commit brings support
for so-called init script. Init script can be used to add common logic
for all the tasks in the submitting workflow.
@cjllanwarne cjllanwarne added the Community Contribution A pull request from the Cromwell open-source community label Jan 10, 2020
@tcibinan
Copy link
Contributor Author

@cjllanwarne Did you have any chance to look at the pull request? Probably you have some thoughts on the matter and I would be very glad to hear them.

@ruchim
Copy link
Contributor

ruchim commented Mar 10, 2020

Hello @tcibinan -- can you explain the type of initialization tasks that you'd invoke using an init_script? It seems in order to use the enable_fuse flag -- some common initialization needs to take place? I'm trying to understand if this is necessary for all tasks that rely on Fuse.

@tcibinan
Copy link
Contributor Author

Hello @ruchim! Thanks for looking into the issue.

The idea behind the init script was to reduce code duplication between all Cromwell tasks that use recently added enable_fuse flag as much as possible. Otherwise mounts have to be manually configured for each and every Cromwell task in order to take advantage of the fuse capabilities.

Definitely the decision either to use or not use such init script highly depends on a workflow. From my point of view if some of workflow tasks use fuse capabilities then most of them probably do the same. Therefore the usage of init script is not required but can be helpful in such cases.

As an example let's look at the following workflow. It just calculates number of files in some of the mounted directories.

count.wdl

version 1.0

workflow count {
    output {
        Int REFERENCES = references.NUMBER
        Int SAMPLES = samples.NUMBER
    }

    call references { }
    call samples { }
}

task references {
    output {
        Int NUMBER = read_int("number")
    }

    command <<<
        mkdir -p /mount-point
        mount 8.8.8.8:/data /mount-point
        ls -lh /mount-point/references/ | wc -l > number
    >>>
}

task samples {
    output {
        Int NUMBER = read_int("number")
    }

    command <<<
        mkdir -p /mount-point
        mount 8.8.8.8:/data /mount-point
        ls -lh /mount-point/samples/ | wc -l > number
    >>>
}

As long as we have some common initialization in both tasks we can extract it to the init script which will be executed right before each task command.

If we perform such optimization then we have to upload init_script.sh to google cloud and enable it in the workflow properties.

init_script.sh

mkdir -p /mount-point
mount 8.8.8.8:/data /mount-point

workflow_options.json

{
    "init_script": "gs://storage/init_script.sh"
}

count.wdl

version 1.0

workflow count {
    output {
        Int REFERENCES = references.NUMBER
        Int SAMPLES = samples.NUMBER
    }

    call references { }
    call samples { }
}

task references {
    output {
        Int NUMBER = read_int("number")
    }

    command <<<
        ls -lh /mount-point/references/ | wc -l > number
    >>>
}

task samples {
    output {
        Int NUMBER = read_int("number")
    }

    command <<<
        ls -lh /mount-point/samples/ | wc -l > number
    >>>
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Contribution A pull request from the Cromwell open-source community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants