-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CROM-6718: FR: Call Caching - Add flag for minimizing chance of GCP cross-region network egress charges being incurred #6432
base: develop
Are you sure you want to change the base?
Conversation
…work egress charges being incurred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall comments:
- Destination location should really be checked once per root workflow and not for every job.
- Codecov has identified a lot of untested code. Some of this we'll probably want to integration test with Centaur, but there's also quite a bit that could easily be reached with unit tests (e.g. PipelinesApiBackendCacheHitCopyingActor.scala)
- Lots of debug logging, printlns and commented out code to be cleaned up / removed
- Needs docs and a CHANGELOG.md entry (which might already be in your other PR, I don't remember)
} | ||
} | ||
|
||
private def issueDestinationLocationCommand(sourceLocation: String, copyCommand: CopyOutputsCommand): State = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This location check should happen once at the root workflow level rather than once for every job in the workflow. The destination bucket will be the same for every job in the workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, Miguel! Do you have a suggestion on how best to achieve this? Its not clear to me how to avoid doing this for every job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with Miguel today we've agreed to leave the destination location checking as-is (at the job level). A few reasons for this are:
- At the workflow level, we don't know the backend type.
- We could cache locations per bucket at the job level, but it would add unnecessary complication to add a caching layer across all JobActors.
I did performance testing by with a wdl that has 50 outputs. On a cache hit, it attempts to copy the 50 files, so with my change it would also perform 50*2 location lookups. I ran this wdl 10 times without my changes, and 10 times with my changes, with Without my changes, the difference between those timestamps was on average 9 seconds. With my changes, the difference between those timestamps was on average 16 seconds. |
There is one test failing in Travis CI, but it looks unrelated to my change. I believe this PR is ready for another review @mcovarr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the next steps for this PR and its sibling are being discussed outside of GitHub
Somewhat. We have a thread open with Kyle on high-level details of performance, complexity, and support. I'd still appreciate a review on the code submitted so far. Thanks! |
Extending mcovarr's work in #6366 . Big shoutout to mcovarr!!!
[Per @mbookman]
This pull request is an initial update to address:
CROM-6718: FR: Add flag for minimizing chance of GCP cross-region network egress charges being incurred
This PR specifically focuses on the risks of egress charges incurred due to call caching. The framing of the approach here, which is a bit broader than originally noted in CROM-6718, is:
Make call caching location-aware, prioritizing copies that minimize egress charges.
Add a workflow option enabling control of what egress charges can be incurred for call cache copying.
The new workflow option would be:
call_cache_egress: [none, continental, global]
where the values affect whether call cache copies can incur egress charges:
none: only within-region copies are allowed, which generate no egress charges
continental: within content copies are allowed; within-content copies have reduced costs, such as $0.01 / GB in the US
global: copies across all regions are allowed. Cross-content egress charges can be much higher (ranging from $0.08 / GB up to $0.23 / GB)
CURRENT STATUS OF PR:
With the changes in this PR, Cromwell successfully checks the location of the source and destination file to be copied, compares the location, and makes a decision of whether or not it should be copied based on the call_cache_egress option. If it should be copied, the files are copied as normal. If it should not be copied, the cache attempt fails and the workflow runs instead.