-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FR: Localization - Add flag for minimizing chance of GCP cross-region network egress charges being incurred [CROM-6764] #6332
base: develop
Are you sure you want to change the base?
Conversation
…nges being incurred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, was the possibility of checking regionality within the Cromwell server rather than on the VM considered? This might not work in all cases if the target zones for the VM is broad, but in community Terra the default list of zones
are all within one region.
centaur/src/main/resources/standardTestCases/exhaustive_delete.test
Outdated
Show resolved
Hide resolved
docs/wf_options/Google.md
Outdated
@@ -17,6 +17,8 @@ These workflow options provide Google-specific information for workflows running | |||
| `enable_ssh_access` | `boolean` | If set to true, will enable SSH access to the Google Genomics worker machines. Please note that this is a community contribution and is not officially supported by the Cromwell development team. | | |||
| `delete_intermediate_output_files` | `boolean` | **Experimental:** Any `File` variables referenced in call `output` sections that are not found in the workflow `output` section will be considered an intermediate `File`. When the workflow finishes and this option is set to `true`, all intermediate `File` objects will be deleted from GCS. Cromwell must be run with the configuration value `system.delete-workflow-files` set to `true`. The default for both values is `false`. NOTE: The behavior of this option on other backends is unspecified. | | |||
| `enable_fuse` | `boolean` | Specifies if workflow tasks should be submitted to Google Pipelines with an additional `ENABLE_FUSE` flag. It causes container to be executed with `CAP_SYS_ADMIN`. Use it only for trusted containers. Please note that this is a community contribution and is not officially supported by the Cromwell development team. | | |||
| `localization_egress` | `string` | Either `local`, `continental` , or `global`. Default `local`. Network egress charges can be incurred if the source bucket of input files are in a different cloud region than the VM. If set to `local`, fail if a source bucket is in a different region from the VM. If set to `continental`, fail if a source bucket is in a different continent from the VM. If set to `global`, allow localization from any region. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default should be global
so this is an opt-in feature that wouldn't break existing submissions in Terra when it is released.
@@ -421,6 +421,14 @@ class PipelinesApiAsyncBackendJobExecutionActor(override val standardParams: Sta | |||
descriptor.workflowOptions.getBoolean(WorkflowOptionKeys.UseDockerImageCache).getOrElse(false) | |||
} | |||
|
|||
protected def localizationEgress(descriptor: BackendWorkflowDescriptor): String = { | |||
descriptor.workflowOptions.getOrElse(WorkflowOptionKeys.LocalizationEgress, "local") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
descriptor.workflowOptions.getOrElse(WorkflowOptionKeys.LocalizationEgress, "local") | |
descriptor.workflowOptions.getOrElse(WorkflowOptionKeys.LocalizationEgress, "global") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could / should this fail fast here without spinning up a VM if the value is not one of the three allowed?
...a/cromwell/backend/google/pipelines/v2alpha1/PipelinesApiAsyncBackendJobExecutionActor.scala
Show resolved
Hide resolved
transferLibraryContainerPath: Path, | ||
gcsTransferConfiguration: GcsTransferConfiguration, | ||
referenceInputsToMountedPathsOpt: Option[Map[PipelinesApiInput, String]]): Future[Unit] = Future.successful(()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the Codecov warning above, referenceInputsToMountedPathsOpt
is not being used. If the logic in uploadGcsEgressCheckScript
is going to consider reference inputs / disks it should exempt inputs which are provided via a reference disk from the regionality check. Otherwise the referenceInputsToMountedPathsOpt
parameter should be removed and the non-consideration of reference disks should be documented as a limitation of regionality checking.
val localizationEgressValue = jobDescriptor.workflowDescriptor.workflowOptions.getOrElse( | ||
WorkflowOptionKeys.LocalizationEgress, "local") | ||
val localizationEgressStrictFlag = jobDescriptor.workflowDescriptor.workflowOptions.getBoolean( | ||
WorkflowOptionKeys.LocalizationEgressStrict).getOrElse(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both of these are available in the superclass cromwell.backend.google.pipelines.common.PipelinesApiAsyncBackendJobExecutionActor
...a/cromwell/backend/google/pipelines/v2alpha1/PipelinesApiAsyncBackendJobExecutionActor.scala
Show resolved
Hide resolved
...alpha1/src/main/scala/cromwell/backend/google/pipelines/v2alpha1/api/EgressCheckAction.scala
Show resolved
Hide resolved
...alpha1/src/main/scala/cromwell/backend/google/pipelines/v2alpha1/api/EgressCheckAction.scala
Show resolved
Hide resolved
@@ -128,6 +128,29 @@ class PipelinesApiAsyncBackendJobExecutionActor(standardParams: StandardAsyncExe | |||
(directoryTransferBundle :: (filesWithSameNamesTransferBundles ++ filesWithDifferentNamesTransferBundles)) mkString "\n\n" | |||
} | |||
|
|||
private def gcsEgressCheckBundle[T <: PipelinesApiInput](transferConfiguration: GcsTransferConfiguration)(bucket: String, inputs: NonEmptyList[T]): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments here as in the alpha version 🙂
It's an interesting idea. Cromwell can act as the user's pet service account, so it would be able to check bucket locations with the same permissions as the solution here, running on the VM. It's true that the default list of zones on Terra is all in one region, but users can set their own zones, and this PR is partly intended to help catch when people have WDLs/inputs that set the zones list to all of the US zones (for example). This PR is intended to be a failsafe that generalizes (as it happens at the time everything is knowable). This PR is also intended to be a stopgap until there can be a more sophisticated on-submission or pre-submission check. With regards to the list of zones, Cromwell submits this to the PAPI and any of these can be chosen. It would be an interesting idea to try to narrow the user's submitted list and pick the "best" zone(s) based on the input locations. That would be a nice additional enhancement in the future. |
@@ -35,6 +37,8 @@ These workflow options provide Google-specific information for workflows running | |||
"custom-label": "custom-value" | |||
}, | |||
"delete_intermediate_output_files": false, | |||
"enable_fuse": false | |||
"enable_fuse": false, | |||
"localization_egress": "local", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TOL up to you but it sounded like most users of this feature would probably opt for continental
?
@@ -173,6 +195,12 @@ class PipelinesApiAsyncBackendJobExecutionActor(standardParams: StandardAsyncExe | |||
|
|||
import mouse.all._ | |||
|
|||
private def generateGcsEgressCheckScript(inputs: List[PipelinesApiInput]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may need to exclude inputs being provided by reference disks as generateGcsLocalizationScript
does. e.g. we have reference disks for several inputs from gs://broad-references
. This bucket is US multi-regional, but continental
regionality would exclude researchers in Australia from being able to use these inputs if the localization check doesn't realize they're being provided by the reference disk and do not require localization.
Similar to #6324 , but with localization instead of call caching.
The framing of the approach here is:
Make localization of inputs location-aware.
Add a workflow option enabling control of what egress charges can be incurred for input localization.
The new workflow option would be:
localization_egress: [none, continental, global]
where the values affect whether localization of inputs can incur egress charges:
none: only within-region copies are allowed, which generate no egress charges
continental: within content copies are allowed; within-content copies have reduced costs, such as $0.01 / GB in the US
global: copies across all regions are allowed. Cross-content egress charges can be much higher (ranging from $0.08 / GB up to $0.23 / GB)
There may also be cases where we have access to get and list objects in the bucket, but can't view the metadata of the bucket. This prevents us from figuring out the location of the bucket. To aid with this, this PR also adds the workflow option:
localization_egress_strict: [true, false]
If localization_egress_strict is true, then the workflow will fail if we are unable to determine the location of the buckets.
CURRENT STATUS OF PR:
The idea for this approach is to add an action right before localization, called egressCheck. In this action, we compare the bucket location of each input file with the location of the VM. If we determine there will be egress charge, we will exit appropriately (depending on workflow option).