You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If there is a series of Rail-RNA steps (alignment-related ones, probably) that achieve good CPU utilization and load balance, then high-cpu instances are cost effective & well suited. If a series of steps is not like this (poor load balance, mostly I/O), then high-cpu instances probably a waste of money, as most CPUs are idle.
If we partition the pipeline into stretches that either do or don't have these properties, then we could launch separate clusters (with different instance types) for those stretches. This could reduce costs relative to a pipeline that runs end-to-end on a big high-cpu cluster.
Downside: we pay the cost of bootstrapping multiple times for a given dataset. But we might also be able to simplify the bootstrapping for any given cluster, since a given cluster is running only a portion of the overall pipeline. E.g. if there's no alignment involved then you don't have to download and install the index. If samtools isn't involve, you don't have to install samtools.
Thanks to elasticity, there's no reason this would have to come at the expense of overall throughput.
The text was updated successfully, but these errors were encountered:
If there is a series of Rail-RNA steps (alignment-related ones, probably) that achieve good CPU utilization and load balance, then high-cpu instances are cost effective & well suited. If a series of steps is not like this (poor load balance, mostly I/O), then high-cpu instances probably a waste of money, as most CPUs are idle.
If we partition the pipeline into stretches that either do or don't have these properties, then we could launch separate clusters (with different instance types) for those stretches. This could reduce costs relative to a pipeline that runs end-to-end on a big high-cpu cluster.
Downside: we pay the cost of bootstrapping multiple times for a given dataset. But we might also be able to simplify the bootstrapping for any given cluster, since a given cluster is running only a portion of the overall pipeline. E.g. if there's no alignment involved then you don't have to download and install the index. If samtools isn't involve, you don't have to install samtools.
Thanks to elasticity, there's no reason this would have to come at the expense of overall throughput.
The text was updated successfully, but these errors were encountered: