Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partition pipeline so high-utilization steps can use different instance type from low-utilization steps #32

Open
BenLangmead opened this issue Feb 17, 2016 · 0 comments

Comments

@BenLangmead
Copy link
Collaborator

If there is a series of Rail-RNA steps (alignment-related ones, probably) that achieve good CPU utilization and load balance, then high-cpu instances are cost effective & well suited. If a series of steps is not like this (poor load balance, mostly I/O), then high-cpu instances probably a waste of money, as most CPUs are idle.

If we partition the pipeline into stretches that either do or don't have these properties, then we could launch separate clusters (with different instance types) for those stretches. This could reduce costs relative to a pipeline that runs end-to-end on a big high-cpu cluster.

Downside: we pay the cost of bootstrapping multiple times for a given dataset. But we might also be able to simplify the bootstrapping for any given cluster, since a given cluster is running only a portion of the overall pipeline. E.g. if there's no alignment involved then you don't have to download and install the index. If samtools isn't involve, you don't have to install samtools.

Thanks to elasticity, there's no reason this would have to come at the expense of overall throughput.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant