-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add init script for papi backend #5342
base: develop
Are you sure you want to change the base?
Add init script for papi backend #5342
Conversation
Along with the existing monitoring script the commit brings support for so-called init script. Init script can be used to add common logic for all the tasks in the submitting workflow.
@cjllanwarne Did you have any chance to look at the pull request? Probably you have some thoughts on the matter and I would be very glad to hear them. |
Hello @tcibinan -- can you explain the type of initialization tasks that you'd invoke using an |
Hello @ruchim! Thanks for looking into the issue. The idea behind the init script was to reduce code duplication between all Cromwell tasks that use recently added Definitely the decision either to use or not use such init script highly depends on a workflow. From my point of view if some of workflow tasks use fuse capabilities then most of them probably do the same. Therefore the usage of init script is not required but can be helpful in such cases. As an example let's look at the following workflow. It just calculates number of files in some of the mounted directories. count.wdl version 1.0
workflow count {
output {
Int REFERENCES = references.NUMBER
Int SAMPLES = samples.NUMBER
}
call references { }
call samples { }
}
task references {
output {
Int NUMBER = read_int("number")
}
command <<<
mkdir -p /mount-point
mount 8.8.8.8:/data /mount-point
ls -lh /mount-point/references/ | wc -l > number
>>>
}
task samples {
output {
Int NUMBER = read_int("number")
}
command <<<
mkdir -p /mount-point
mount 8.8.8.8:/data /mount-point
ls -lh /mount-point/samples/ | wc -l > number
>>>
} As long as we have some common initialization in both tasks we can extract it to the init script which will be executed right before each task command. If we perform such optimization then we have to upload init_script.sh mkdir -p /mount-point
mount 8.8.8.8:/data /mount-point workflow_options.json {
"init_script": "gs://storage/init_script.sh"
} count.wdl version 1.0
workflow count {
output {
Int REFERENCES = references.NUMBER
Int SAMPLES = samples.NUMBER
}
call references { }
call samples { }
}
task references {
output {
Int NUMBER = read_int("number")
}
command <<<
ls -lh /mount-point/references/ | wc -l > number
>>>
}
task samples {
output {
Int NUMBER = read_int("number")
}
command <<<
ls -lh /mount-point/samples/ | wc -l > number
>>>
} |
Cromwell engine with Google Cloud backend provides support for so-called monitoring script that can be used to monitor virtual machine / container stats while cromwell task command is being executed. The monitoring script launches asynchronously right before the task command and ends right after the command has finished.
The monitoring scripts does help a lot in the monitoring processes but it cannot be used to add some common initialization for all cromwell tasks as long as it is launched asynchronously.
Nevertheless a possibility to have support for some common initialization logic for all cromwell tasks can be of help. For example, if most of the workflow tasks uses filesystem mounts then their initialization can be either specified in the beginning of each task or it can be specified in a single place, so-called initialization script.
The support for initialization script is inspired totally by the monitoring script and the implementation is pretty the same.
Initialization script can be specified using the
init_script
workflow option.