partition pipeline so high-utilization steps can use different instance type from low-utilization steps #32

BenLangmead · 2016-02-17T23:21:15Z

If there is a series of Rail-RNA steps (alignment-related ones, probably) that achieve good CPU utilization and load balance, then high-cpu instances are cost effective & well suited. If a series of steps is not like this (poor load balance, mostly I/O), then high-cpu instances probably a waste of money, as most CPUs are idle.

If we partition the pipeline into stretches that either do or don't have these properties, then we could launch separate clusters (with different instance types) for those stretches. This could reduce costs relative to a pipeline that runs end-to-end on a big high-cpu cluster.

Downside: we pay the cost of bootstrapping multiple times for a given dataset. But we might also be able to simplify the bootstrapping for any given cluster, since a given cluster is running only a portion of the overall pipeline. E.g. if there's no alignment involved then you don't have to download and install the index. If samtools isn't involve, you don't have to install samtools.

Thanks to elasticity, there's no reason this would have to come at the expense of overall throughput.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

partition pipeline so high-utilization steps can use different instance type from low-utilization steps #32

partition pipeline so high-utilization steps can use different instance type from low-utilization steps #32

BenLangmead commented Feb 17, 2016

partition pipeline so high-utilization steps can use different instance type from low-utilization steps #32

partition pipeline so high-utilization steps can use different instance type from low-utilization steps #32

Comments

BenLangmead commented Feb 17, 2016