Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Localization - Add flag for minimizing chance of GCP cross-region network egress charges being incurred [CROM-6764] #6332

Open
wants to merge 28 commits into
base: develop
Choose a base branch
from

Conversation

wnojopra
Copy link

@wnojopra wnojopra commented May 5, 2021

Similar to #6324 , but with localization instead of call caching.

The framing of the approach here is:
Make localization of inputs location-aware.
Add a workflow option enabling control of what egress charges can be incurred for input localization.
The new workflow option would be:
localization_egress: [none, continental, global]

where the values affect whether localization of inputs can incur egress charges:
none: only within-region copies are allowed, which generate no egress charges
continental: within content copies are allowed; within-content copies have reduced costs, such as $0.01 / GB in the US
global: copies across all regions are allowed. Cross-content egress charges can be much higher (ranging from $0.08 / GB up to $0.23 / GB)

There may also be cases where we have access to get and list objects in the bucket, but can't view the metadata of the bucket. This prevents us from figuring out the location of the bucket. To aid with this, this PR also adds the workflow option:
localization_egress_strict: [true, false]
If localization_egress_strict is true, then the workflow will fail if we are unable to determine the location of the buckets.

CURRENT STATUS OF PR:

The idea for this approach is to add an action right before localization, called egressCheck. In this action, we compare the bucket location of each input file with the location of the VM. If we determine there will be egress charge, we will exit appropriately (depending on workflow option).

@wnojopra wnojopra closed this May 26, 2021
@wnojopra wnojopra reopened this May 26, 2021
@evanbernstein evanbernstein changed the title FR: Localization - Add flag for minimizing chance of GCP cross-region network egress charges being incurred FR: Localization - Add flag for minimizing chance of GCP cross-region network egress charges being incurred [CROM-6764] Jun 10, 2021
Copy link
Contributor

@mcovarr mcovarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, was the possibility of checking regionality within the Cromwell server rather than on the VM considered? This might not work in all cases if the target zones for the VM is broad, but in community Terra the default list of zones are all within one region.

@@ -17,6 +17,8 @@ These workflow options provide Google-specific information for workflows running
| `enable_ssh_access` | `boolean` | If set to true, will enable SSH access to the Google Genomics worker machines. Please note that this is a community contribution and is not officially supported by the Cromwell development team. |
| `delete_intermediate_output_files` | `boolean` | **Experimental:** Any `File` variables referenced in call `output` sections that are not found in the workflow `output` section will be considered an intermediate `File`. When the workflow finishes and this option is set to `true`, all intermediate `File` objects will be deleted from GCS. Cromwell must be run with the configuration value `system.delete-workflow-files` set to `true`. The default for both values is `false`. NOTE: The behavior of this option on other backends is unspecified. |
| `enable_fuse` | `boolean` | Specifies if workflow tasks should be submitted to Google Pipelines with an additional `ENABLE_FUSE` flag. It causes container to be executed with `CAP_SYS_ADMIN`. Use it only for trusted containers. Please note that this is a community contribution and is not officially supported by the Cromwell development team. |
| `localization_egress` | `string` | Either `local`, `continental` , or `global`. Default `local`. Network egress charges can be incurred if the source bucket of input files are in a different cloud region than the VM. If set to `local`, fail if a source bucket is in a different region from the VM. If set to `continental`, fail if a source bucket is in a different continent from the VM. If set to `global`, allow localization from any region. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default should be global so this is an opt-in feature that wouldn't break existing submissions in Terra when it is released.

@@ -421,6 +421,14 @@ class PipelinesApiAsyncBackendJobExecutionActor(override val standardParams: Sta
descriptor.workflowOptions.getBoolean(WorkflowOptionKeys.UseDockerImageCache).getOrElse(false)
}

protected def localizationEgress(descriptor: BackendWorkflowDescriptor): String = {
descriptor.workflowOptions.getOrElse(WorkflowOptionKeys.LocalizationEgress, "local")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
descriptor.workflowOptions.getOrElse(WorkflowOptionKeys.LocalizationEgress, "local")
descriptor.workflowOptions.getOrElse(WorkflowOptionKeys.LocalizationEgress, "global")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could / should this fail fast here without spinning up a VM if the value is not one of the three allowed?

transferLibraryContainerPath: Path,
gcsTransferConfiguration: GcsTransferConfiguration,
referenceInputsToMountedPathsOpt: Option[Map[PipelinesApiInput, String]]): Future[Unit] = Future.successful(())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the Codecov warning above, referenceInputsToMountedPathsOpt is not being used. If the logic in uploadGcsEgressCheckScript is going to consider reference inputs / disks it should exempt inputs which are provided via a reference disk from the regionality check. Otherwise the referenceInputsToMountedPathsOpt parameter should be removed and the non-consideration of reference disks should be documented as a limitation of regionality checking.

val localizationEgressValue = jobDescriptor.workflowDescriptor.workflowOptions.getOrElse(
WorkflowOptionKeys.LocalizationEgress, "local")
val localizationEgressStrictFlag = jobDescriptor.workflowDescriptor.workflowOptions.getBoolean(
WorkflowOptionKeys.LocalizationEgressStrict).getOrElse(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both of these are available in the superclass cromwell.backend.google.pipelines.common.PipelinesApiAsyncBackendJobExecutionActor

@@ -128,6 +128,29 @@ class PipelinesApiAsyncBackendJobExecutionActor(standardParams: StandardAsyncExe
(directoryTransferBundle :: (filesWithSameNamesTransferBundles ++ filesWithDifferentNamesTransferBundles)) mkString "\n\n"
}

private def gcsEgressCheckBundle[T <: PipelinesApiInput](transferConfiguration: GcsTransferConfiguration)(bucket: String, inputs: NonEmptyList[T]): String = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments here as in the alpha version 🙂

@wnojopra
Copy link
Author

Just curious, was the possibility of checking regionality within the Cromwell server rather than on the VM considered? This might not work in all cases if the target zones for the VM is broad, but in community Terra the default list of zones are all within one region.

It's an interesting idea. Cromwell can act as the user's pet service account, so it would be able to check bucket locations with the same permissions as the solution here, running on the VM.

It's true that the default list of zones on Terra is all in one region, but users can set their own zones, and this PR is partly intended to help catch when people have WDLs/inputs that set the zones list to all of the US zones (for example). This PR is intended to be a failsafe that generalizes (as it happens at the time everything is knowable). This PR is also intended to be a stopgap until there can be a more sophisticated on-submission or pre-submission check.

With regards to the list of zones, Cromwell submits this to the PAPI and any of these can be chosen. It would be an interesting idea to try to narrow the user's submitted list and pick the "best" zone(s) based on the input locations. That would be a nice additional enhancement in the future.

@wnojopra wnojopra requested a review from mcovarr June 11, 2021 17:53
@@ -35,6 +37,8 @@ These workflow options provide Google-specific information for workflows running
"custom-label": "custom-value"
},
"delete_intermediate_output_files": false,
"enable_fuse": false
"enable_fuse": false,
"localization_egress": "local",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOL up to you but it sounded like most users of this feature would probably opt for continental?

@@ -173,6 +195,12 @@ class PipelinesApiAsyncBackendJobExecutionActor(standardParams: StandardAsyncExe

import mouse.all._

private def generateGcsEgressCheckScript(inputs: List[PipelinesApiInput])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may need to exclude inputs being provided by reference disks as generateGcsLocalizationScript does. e.g. we have reference disks for several inputs from gs://broad-references. This bucket is US multi-regional, but continental regionality would exclude researchers in Australia from being able to use these inputs if the localization check doesn't realize they're being provided by the reference disk and do not require localization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants