CROM-6718: FR: Call Caching - Add flag for minimizing chance of GCP cross-region network egress charges being incurred #6432

wnojopra · 2021-07-02T19:28:44Z

Extending mcovarr's work in #6366 . Big shoutout to mcovarr!!!

[Per @mbookman]
This pull request is an initial update to address:

CROM-6718: FR: Add flag for minimizing chance of GCP cross-region network egress charges being incurred

This PR specifically focuses on the risks of egress charges incurred due to call caching. The framing of the approach here, which is a bit broader than originally noted in CROM-6718, is:
Make call caching location-aware, prioritizing copies that minimize egress charges.
Add a workflow option enabling control of what egress charges can be incurred for call cache copying.
The new workflow option would be:

call_cache_egress: [none, continental, global]

where the values affect whether call cache copies can incur egress charges:
none: only within-region copies are allowed, which generate no egress charges
continental: within content copies are allowed; within-content copies have reduced costs, such as $0.01 / GB in the US
global: copies across all regions are allowed. Cross-content egress charges can be much higher (ranging from $0.08 / GB up to $0.23 / GB)

CURRENT STATUS OF PR:

With the changes in this PR, Cromwell successfully checks the location of the source and destination file to be copied, compares the location, and makes a decision of whether or not it should be copied based on the call_cache_egress option. If it should be copied, the files are copied as normal. If it should not be copied, the cache attempt fails and the workflow runs instead.

…work egress charges being incurred

mcovarr

Overall comments:

Destination location should really be checked once per root workflow and not for every job.
Codecov has identified a lot of untested code. Some of this we'll probably want to integration test with Centaur, but there's also quite a bit that could easily be reached with unit tests (e.g. PipelinesApiBackendCacheHitCopyingActor.scala)
Lots of debug logging, printlns and commented out code to be cleaned up / removed
Needs docs and a CHANGELOG.md entry (which might already be in your other PR, I don't remember)

mcovarr · 2021-07-20T19:11:25Z

backend/src/main/scala/cromwell/backend/standard/callcaching/StandardCacheHitCopyingActor.scala

+    }
+  }
+
+  private def issueDestinationLocationCommand(sourceLocation: String, copyCommand: CopyOutputsCommand): State = {


This location check should happen once at the root workflow level rather than once for every job in the workflow. The destination bucket will be the same for every job in the workflow.

Thanks for the review, Miguel! Do you have a suggestion on how best to achieve this? Its not clear to me how to avoid doing this for every job.

After discussing with Miguel today we've agreed to leave the destination location checking as-is (at the job level). A few reasons for this are:

At the workflow level, we don't know the backend type.

We could cache locations per bucket at the job level, but it would add unnecessary complication to add a caching layer across all JobActors.

wnojopra · 2021-08-04T20:28:42Z

I did performance testing by with a wdl that has 50 outputs. On a cache hit, it attempts to copy the 50 files, so with my change it would also perform 50*2 location lookups. I ran this wdl 10 times without my changes, and 10 times with my changes, with call_cache_egress set to "none". For each run, I looked at the timestamps for when the job hashing job is initialized , to when the workflow completes.

Without my changes, the difference between those timestamps was on average 9 seconds.

With my changes, the difference between those timestamps was on average 16 seconds.

wnojopra · 2021-08-04T20:29:13Z

There is one test failing in Travis CI, but it looks unrelated to my change. I believe this PR is ready for another review @mcovarr

kshakir

I believe the next steps for this PR and its sibling are being discussed outside of GitHub

wnojopra · 2021-08-09T18:26:13Z

I believe the next steps for this PR and its sibling are being discussed outside of GitHub

Somewhat. We have a thread open with Kyle on high-level details of performance, complexity, and support. I'd still appreciate a review on the code submitted so far. Thanks!

Call Caching - Add flag for minimizing chance of GCP cross-region net…

d3a32d0

…work egress charges being incurred

wnojopra mentioned this pull request Jul 2, 2021

CROM-6718: FR: Call Caching - Add flag for minimizing chance of GCP cross-region network egress charges being incurred #6324

Closed

mcovarr reviewed Jul 20, 2021

View reviewed changes

Willy Nojopranoto added 2 commits July 20, 2021 22:35

Cleanup of debug logs

07b0319

Remove unused default case

29c85c2

mcovarr mentioned this pull request Jul 21, 2021

Running and publishing workflows in Cromwell on GCP with GCR images exposes developers to unexpected egress costs #6442

Open

Willy Nojopranoto added 9 commits July 28, 2021 16:59

Check the destination location only once at the workflow level

62578f5

Undo last commit

8e9d50a

Add basic centaur test for call_cache_egress flag = global

443a05e

Fix missing testFormat

ae1d8c8

Add more unit tests

a817fcf

Merge remote-tracking branch 'origin/develop' into wn-cache-3

511b927

Make test task unique to not overlap with other test call caches

a71f798

Add more centaur tests

a906a0d

Fix wdl typo

90d6852

evanbernstein requested review from kshakir and mcovarr August 3, 2021 14:36

Willy Nojopranoto added 4 commits August 3, 2021 15:30

Fix for metadata values

ea78129

callCaching.result, not hit

17fee93

Change approach to none and continental test

6c39a4f

Add changelog and docs updates

e66d606

kshakir reviewed Aug 9, 2021

View reviewed changes

mcovarr removed their request for review April 4, 2022 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CROM-6718: FR: Call Caching - Add flag for minimizing chance of GCP cross-region network egress charges being incurred #6432

CROM-6718: FR: Call Caching - Add flag for minimizing chance of GCP cross-region network egress charges being incurred #6432

wnojopra commented Jul 2, 2021

mcovarr left a comment

mcovarr Jul 20, 2021

wnojopra Jul 20, 2021

wnojopra Jul 29, 2021

wnojopra commented Aug 4, 2021

wnojopra commented Aug 4, 2021 •

edited

Loading

kshakir left a comment

wnojopra commented Aug 9, 2021

CROM-6718: FR: Call Caching - Add flag for minimizing chance of GCP cross-region network egress charges being incurred #6432

Are you sure you want to change the base?

CROM-6718: FR: Call Caching - Add flag for minimizing chance of GCP cross-region network egress charges being incurred #6432

Conversation

wnojopra commented Jul 2, 2021

CURRENT STATUS OF PR:

mcovarr left a comment

Choose a reason for hiding this comment

mcovarr Jul 20, 2021

Choose a reason for hiding this comment

wnojopra Jul 20, 2021

Choose a reason for hiding this comment

wnojopra Jul 29, 2021

Choose a reason for hiding this comment

wnojopra commented Aug 4, 2021

wnojopra commented Aug 4, 2021 • edited Loading

kshakir left a comment

Choose a reason for hiding this comment

wnojopra commented Aug 9, 2021

wnojopra commented Aug 4, 2021 •

edited

Loading