Skip to content

Commit

Permalink
0.7.166
Browse files Browse the repository at this point in the history
  • Loading branch information
John Major committed Mar 9, 2025
1 parent aa0e750 commit ce515b6
Show file tree
Hide file tree
Showing 5 changed files with 180 additions and 46 deletions.
160 changes: 121 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Daylily AWS Ephemeral Cluster Setup
_(0.7.164)_
_(0.7.166)_

**beta release**

Expand All @@ -11,70 +11,152 @@ Daylily is a framework for setting up ephemeral AWS clusters optimized for genom

## Table of Contents
## Table of Contents
- [Daylily AWS Ephemeral Cluster Setup](#daylily-aws-ephemeral-cluster-setup)
- [Table of Contents](#table-of-contents)
- [Table of Contents](#table-of-contents-1)
- [Intention](#intention)
- [Goal 1: Shift Conversation To Better Ways of Assessing Tools, Spend Less Time Finding Winners & Losers](#goal-1-shift-conversation-to-better-ways-of-assessing-tools-spend-less-time-finding-winners--losers)
- [Goal 2: Establish Higher Expectations re: What Is Considered Sufficient Supporting Data/Docs For Published Tools](#goal-2-establish-higher-expectations-re-what-is-considered-sufficient-supporting-datadocs-for-published-tools)
- [Goal 3: Move Beyond 'commonly accepted best practices'](#goal-3-move-beyond-commonly-accepted-best-practices)
- [Shift Focus](#shift-focus)
- [Raise the Bar](#raise-the-bar)
- [Escape Outdated ‘Best Practices’](#escape-outdated-best-practices)
- [Intention](#intention-1)
- [Goal 1: Shift Conversation To Better Ways of Assessing Tools, Spend Less Time Finding Winners \& Loosers](#goal-1-shift-conversation-to-better-ways-of-assessing-tools-spend-less-time-finding-winners--loosers)
- [Goal 2: Establish Higher Expectations re: What Is Considered Sufficent Supporting Data/Docs For Published Tools](#goal-2-establish-higher-expectations-re-what-is-considered-sufficent-supporting-datadocs-for-published-tools)
- [Goal 3: Move Beyond 'commonly accepted best practices'. Why? Because they have our field stuck in 2012](#goal-3-move-beyond-commonly-accepted-best-practices-why-because-they-have-our-field-stuck-in-2012)
- [Whitepaper In Progress](#whitepaper-in-progress)
- [What's It All About?](#whats-it-all-about)
- [BFAIR: Bioinformatics FAIR Principles](#bfair-bioinformatics-fair-principles)
- [Comprehensive Cost Transparency & Predictability](#comprehensive-cost-transparency--predictability)
- [Comprehensive Cost Transparency \& Predictability (wip: interactive cost calculator is available here)](#comprehensive-cost-transparency--predictability-wip-interactive-cost-calculator-is-available-here)
- [Self Funded Science](#self-funded-science)
- [Installation](#installation)
- [Quickest Start](#installation----quickest-start)
- [Detailed Installation](#installation----detailed)
- [AWS](#aws)
- [Create a `daylily-service` IAM User](#create-a-daylily-service--iam-user)
- [Attach Permissions & Policies To The `daylily-service` User](#attach-permissiong--policies-to-the-daylily-service-user)
- [Additional AWS Considerations](#additional-aws-considerations-also-will-need-admin-intervention)
- [AWS `daylily-service` User Account](#aws-daylily-service-user-account)
- [Installation -- Quickest Start](#installation----quickest-start)
- [Installation -- Detailed](#installation----detailed)
- [AWS](#aws)
- [Create a `daylily-service` IAM User](#create-a-daylily-service--iam-user)
- [Attach Permissiong \& Policies To The `daylily-service` User](#attach-permissiong--policies-to-the-daylily-service-user)
- [Permissions](#permissions)
- [Create Service Linked Role `VERY IMPORTANT`](#create-service-linked-role-very-important)
- [Inline Policy](#inline-policy)
- [Additional AWS Considerations (also will need _admin_ intervention)](#additional-aws-considerations-also-will-need-admin-intervention)
- [Quotas](#quotas)
- [Activate Cost Allocation Tags (optional, but strongly suggested)](#activate-cost-allocation-tags-optional-but-strongly-suggested)
- [A Note On Budgets](#a-note-on-budgets)
- [AWS `daylily-service` User Account](#aws-daylily-service-user-account)
- [CLI Credentials](#cli-credentials)
- [SSH Key Pair(s)](#ssh-key-pairs)
- [Default Region `us-west-2`](#default-region-us-west-2)
- [Prerequisites (On Your Local Machine)](#prerequisites-on-your-local-machine)
- [SSH Key Pair(s)](#ssh-key-pairs)
- [Place .pem File \& Set Permissions](#place-pem-file--set-permissions)
- [Default Region `us-west-2`](#default-region-us-west-2)
- [Prerequisites (On Your Local Machine)](#prerequisites-on-your-local-machine)
- [System Packages](#system-packages)
- [Check if your prereq's meet the min versions required](#check-if-your-prereqs-meet-the-min-versions-required)
- [AWS CLI Configuration](#aws-cli-configuration)
- [Opt 2](#opt-2)
- [Clone `daylily` Git Repository](#clone-daylily-git-repository)
- [Install Miniconda](#install-miniconda)
- [Install Miniconda (homebrew is not advised)](#install-miniconda-homebrew-is-not-advised)
- [Install DAYCLI Environment](#install-daycli-environment)
- [Ephemeral Cluster Creation](#ephemeral-cluster-creation)
- [Reference Bucket](#daylily-references-public-reference-bucket)
- [daylily-references-public Reference Bucket](#daylily-references-public-reference-bucket)
- [Clone `daylily-references-public` to YOURPREFIX-omics-analysis-REGION](#clone-daylily-references-public-to-yourprefix-omics-analysis-region)
- [Generate Analysis Cost Estimates per Availability Zone](#generate-analysis-cost-estimates-per-availability-zone)
- [Create An Ephemeral Cluster](#create-an-ephemeral-cluster)
- [Run Remote Slurm Tests On Headnode](#run-remote-slurm-tests-on-headnode)
- [Review Clusters](#review-clusters)
- [Confirm The Headnode Is Configured](#confirm-the-headnode-is-configured)
- [Costs](#costs)
- [Monitoring (tags and budgets)](#monitoring-tags-and-budgets)
- [Regulating Usage via Budgets](#regulating-usage-via-budgets)
- [Cost Breakdown](#cost-breakdown)
- [PCUI (ParallelCluster User Interface)](#pcui-technically-optional-but-you-will-be-missing-out)
- [OF HOT \& IDLE CLUSTER ( ~$1.68 / hr )](#of-hot--idle-cluster--168--hr-)
- [OF RUNNING CLUSTER ( \>= $1.20 / hr )](#of-running-cluster---120--hr-)
- [Spot instances ( ~$1.20 / hr per 192vcpu instance )](#spot-instances--120--hr-per-192vcpu-instance-)
- [Data transfer, during analysis ( ~$0.00 )](#data-transfer-during-analysis--000-)
- [Data transfer, staging and moving off cluster ( ~$0.00 to \> $0.00/hr )](#data-transfer-staging-and-moving-off-cluster--000-to--000hr-)
- [Storage, during analysis ( ~$0.00 )](#storage-during-analysis--000-)
- [OF DELETED CLUSTER -- compute and Fsx ( ~$0.00 )](#of-deleted-cluster----compute-and-fsx--000-)
- [OF REFERENCE DATA in S3 ( $14.50 / month )](#of-reference-data-in-s3--1450--month-)
- [OF SAMPLE / READ DATA in S3 ( $0.00 to $A LOT / month )](#of-sample--read-data-in-s3--000-to-a-lot--month-)
- [OF RESULTS DATA in S3 ( $Varies, are you storing BAM or CRAM, vcf.gz or gvcf.gz? )](#of-results-data-in-s3--varies-are-you-storing-bam-or-cram-vcfgz-or-gvcfgz-)
- [PCUI (technically optional, but you will be missing out)](#pcui-technically-optional-but-you-will-be-missing-out)
- [Install Steps](#install-steps)
- [PCUI Costs ( ~ $1.00 / month )](#pcui-costs---100--month-)
- [Working With The Ephemeral Clusters](#working-with-the-ephemeral-clusters)
- [DAYCLI & AWS Parallel Cluster CLI (pcluster)](#daycli--aws-parallel-cluster-cli-pcluster)
- [Running Workflows](#running-workflows)
- [From The Ephemeral Cluster Headnode](#from-the-ephemeral-cluster-headnode)
- [Confirm Headnode Configuration](#confirm-headnode-configuration-is-complete)
- [Run A Local Test Workflow](#run-a-local-test-workflow)
- [Run A Slurm Test Workflow](#run-a-slurm-test-workflow)
- [Create Your Own `config/analysis_manifest.csv` File](#to-create-your-own-configanalysis_manifestcsv-file-from-your-own-analysis_samples.tsv-file)
- [Slurm Monitoring](#slurm-monitoring)
- [Monitor Slurm Submitted Jobs](#monitor-slurm-submitted-jobs)
- [SSH Into Compute Nodes](#ssh-into-compute-nodes)
- [Deleting A Cluster](#delete-cluster)
- [Export `fsx` Analysis Results Back To S3](#export-fsx-analysis-results-back-to-s3)
- [Delete The Cluster](#delete-the-cluster-for-real)
- [Other Monitoring Tools](#other-monitoring-tools)
- [PCUI](#pcui)
- [DAYCLI \& AWS Parallel Cluster CLI (pcluster)](#daycli--aws-parallel-cluster-cli-pcluster)
- [Activate The DAYCLI Conda Environment](#activate-the-daycli-conda-environment)
- [`pcluster` CLI Usage](#pcluster-cli-usage)
- [List Clusters](#list-clusters)
- [Describe Cluster](#describe-cluster)
- [SSH Into Cluster Headnode](#ssh-into-cluster-headnode)
- [Basic](#basic)
- [Facilitated](#facilitated)
- [From The Epheemeral Cluster Headnode](#from-the-epheemeral-cluster-headnode)
- [Confirm Headnode Configuration Is Complete](#confirm-headnode-configuration-is-complete)
- [Headnode Confiugration Incomplete](#headnode-confiugration-incomplete)
- [Confirm Headnode /fsx/ Directory Structure](#confirm-headnode-fsx-directory-structure)
- [Run A Local Test Workflow](#run-a-local-test-workflow)
- [More On The `-j` Flag](#more-on-the--j-flag)
- [Run A Slurm Test Workflow](#run-a-slurm-test-workflow)
- [(RUN ON A FULL 30x WGS DATA SET)](#run-on-a-full-30x-wgs-data-set)
- [Specify A Single Sample Manifest](#specify-a-single-sample-manifest)
- [Specify A Multi-Sample Manifest (in this case, all 7 GIAB samples) - 2 aligners, 1 deduper, 2 snv callers](#specify-a-multi-sample-manifest-in-this-case-all-7-giab-samples---2-aligners-1-deduper-2-snv-callers)
- [The Whole Magilla (3 aligners, 1 deduper, 5 snv callers, 3 sv callers)](#the-whole-magilla-3-aligners-1-deduper-5-snv-callers-3-sv-callers)
- [To Create Your Own `config/analysis_manifest.csv` File From Your Own `analysis_samples.tsv` File](#to-create-your-own-configanalysis_manifestcsv-file-from-your-own-analysis_samplestsv-file)
- [Slurm Monitoring](#slurm-monitoring)
- [Monitor Slurm Submitted Jobs](#monitor-slurm-submitted-jobs)
- [SSH Into Compute Nodes](#ssh-into-compute-nodes)
- [Delete Cluster](#delete-cluster)
- [Export `fsx` Analysis Results Back To S3](#export-fsx-analysis-results-back-to-s3)
- [Facilitated](#facilitated-1)
- [Via `FSX` Console](#via-fsx-console)
- [Delete The Cluster, For Real](#delete-the-cluster-for-real)
- [Other Monitoring Tools](#other-monitoring-tools)
- [PCUI (Parallel Cluster User Interface)](#pcui-parallel-cluster-user-interface)
- [Quick SSH Into Headnode](#quick-ssh-into-headnode)
- [AWS Cloudwatch](#aws-cloudwatch)
- [S3 Reference Bucket & Fsx Filesystem](#s3-reference-bucket--fsx-filesystem)
- [`PREFIX-omics-analysis-REGION` Reference Bucket](#prefix-omics-analysis-region-reference-bucket)
- [Fsx Filesystem](#fsx-filesystem)
- [And There Is More](#and-there-is-more)
- [S3 Reference Bucket \& Fsx Filesystem](#s3-reference-bucket--fsx-filesystem)
- [PREFIX-omics-analysis-REGION Reference Bucket](#prefix-omics-analysis-region-reference-bucket)
- [Reference Bucket Metrics](#reference-bucket-metrics)
- [The `YOURPREFIX-omics-analysis-REGION` s3 Bucket](#the-yourprefix-omics-analysis-region-s3-bucket)
- [daylily-references-public Bucket Contents](#daylily-references-public-bucket-contents)
- [Top Level Diretories](#top-level-diretories)
- [Fsx Filesystem](#fsx-filesystem)
- [Fsx Directory Structure](#fsx-directory-structure)
- [In Progress // Future Development](#in-progress--future-development)
- [Re-enable Sentieon Workflows \& Include in Benchmarking](#re-enable-sentieon-workflows--include-in-benchmarking)
- [Add Strobe Aligner To Benchmarking](#add-strobe-aligner-to-benchmarking)
- [Using Data From Benchmarking Experiments, Complete The Comprehensive Cost Caclulator](#using-data-from-benchmarking-experiments-complete-the-comprehensive-cost-caclulator)
- [Break Daylily Into 2 Parts: 1) Ephermal Cluster Manager 2) Analysis Pipeline](#break-daylily-into-2-parts-1-ephermal-cluster-manager-2-analysis-pipeline)
- [Update Analysis Pipeline To Run With Snakemake v8.\*](#update-analysis-pipeline-to-run-with-snakemake-v8)
- [Cromwell \& WDL's](#cromwell--wdls)
- [General Components Overview](#general-components-overview)
- [Managed Genomics Analysis Services](#managed-genomics-analysis-services)
- [Bioinformatics Metrics](#metrics-required-to-make-informed-decisions-about-choosing-an-analysis-pipeline)
- [Sentieon Tools & License](#sentieon-tools--license)
- [Some Bioinformatics Bits, Big Picture](#some-bioinformatics-bits-big-picture)
- [The DAG For 1 Sample Running Through The `BWA-MEM2ert+Doppelmark+Deepvariant+Manta+TIDDIT+Dysgu+Svaba+QCforDays` Pipeline](#the-dag-for-1-sample-running-through-the-bwa-mem2ertdoppelmarkdeepvariantmantatidditdysgusvabaqcfordays-pipeline)
- [Daylily Framework, Cont.](#daylily-framework-cont)
- [Batch QC HTML Summary Report](#batch-qc-html-summary-report)
- [Consistent + Easy To Navigate Results Directory \& File Structure](#consistent--easy-to-navigate-results-directory--file-structure)
- [Automated Concordance Analysis Table](#automated-concordance-analysis-table)
- [Performance Monitoring Reports](#performance-monitoring-reports)
- [Observability w/CloudWatch Dashboard](#observability-wcloudwatch-dashboard)
- [Cost Tracking and Budget Enforcement](#cost-tracking-and-budget-enforcement)
- [Metrics Required To Make Informed Decisions About Choosing An Analysis Pipeline](#metrics-required-to-make-informed-decisions-about-choosing-an-analysis-pipeline)
- [Accuracy / Precision / Recall / Fscore](#accuracy--precision--recall--fscore)
- [User Run Time](#user-run-time)
- [Cost Of Analysis](#cost-of-analysis)
- [Init Cost](#init-cost)
- [Compute Cost](#compute-cost)
- [Storage Cost (for computation)](#storage-cost-for-computation)
- [Other Costs (ie: data transfer)](#other-costs-ie-data-transfer)
- [Cost of Storage](#cost-of-storage)
- [Reproducibility](#reproducibility)
- [Longevity of Results](#longevity-of-results)
- [Sentieon Tools \& License](#sentieon-tools--license)
- [Contributing](#contributing)
- [Versioning](#versioning)
- [Known Issues](#known-issues)
- [_Fsx Mount Times Out During Headnode Creation \& Causes Pcluster `build-cluster` To Fail_](#fsx-mount-times-out-during-headnode-creation--causes-pcluster-build-cluster-to-fail)
- [Cloudstack Formation Fails When Creating Clusters In \>1 AZ A Region (must be manually sorted ATM)](#cloudstack-formation-fails-when-creating-clusters-in-1-az-a-region-must-be-manually-sorted-atm)
- [Compliance / Data Security](#compliance--data-security)
- [Detailed Docs](#detailed-docs)
- [DAY](#day)



Expand Down
43 changes: 37 additions & 6 deletions bin/daylily-create-ephemeral-cluster
Original file line number Diff line number Diff line change
Expand Up @@ -559,27 +559,55 @@ select arn_policy_id in "${policy_arns[@]}"; do
done
echo ""
if [[ -z "$arn_policy_id" ]]; then
echo "Error: No IAM policy ARN selected. Exiting."
exit 3
fi
echo ""
## Function to validate cluster name
#validate_cluster_name() {
# if [[ ! "$1" =~ ^[a-zA-Z0-9\-]+$ ]] || [[ ${#1} -gt 25 ]]; then
# return 1
# else
# return 0
# fi
#}
#
# Prompt user for cluster name
#while true; do
# echo -n "Enter the name for your cluster (alphanumeric and '-', max 25 chars): "
# read cluster_name
#
# if validate_cluster_name "$cluster_name"; then
# echo "Cluster name accepted: $cluster_name"
# break
# else
# echo "Error: Invalid cluster name. Please ensure it is alphanumeric, may include '-', and is 25 characters or fewer."
# fi
#done
# Function to validate cluster name
validate_cluster_name() {
if [[ ! "$1" =~ ^[a-zA-Z0-9\-]+$ ]] || [[ ${#1} -gt 25 ]]; then
exit 1
return 1
else
return 0
fi
}
# Prompt user for cluster name
# Prompt user for cluster name in a loop until valid input is given
while true; do
echo -n "Enter the name for your cluster (alphanumeric and '-', max 25 chars): "
read cluster_name
read -rp "Enter the name for your cluster (alphanumeric and '-', max 25 chars): " cluster_name
if validate_cluster_name "$cluster_name"; then
echo "Cluster name accepted: $cluster_name"
echo "Cluster name accepted: $cluster_name"
break
else
echo "Error: Invalid cluster name. Please ensure it is alphanumeric, may include '-', and is 25 characters or fewer."
echo "❌ Error: Invalid cluster name."
echo " ➜ The name must be alphanumeric, may include '-', and be 25 characters or fewer."
fi
done
Expand Down Expand Up @@ -830,6 +858,8 @@ done
echo "You selected the allocation strategy: $allocation_strategy"
echo ""
git_deets=$(bin/get_git_deets.sh)
# Write variables to config
cat <<EOF > $regsub_vals
REGSUB_REGION=$region
Expand All @@ -853,6 +883,7 @@ REGSUB_SAVE_FSX=$save_fsx
REGSUB_ENFORCE_BUDGET=$enforce_budget_bool
REGSUB_AWS_ACCOUNT_ID=aws_profile-$AWS_PROFILE
REGSUB_ALLOCATION_STRATEGY=$allocation_strategy
REGSUB_DAYLILY_GIT_DEETS=$git_deets
EOF
echo ""
Expand Down
19 changes: 19 additions & 0 deletions bin/get_git_deets.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash

# Check if we're in a Git repository
if ! git rev-parse --is-inside-work-tree &>/dev/null; then
echo "Not inside a Git repository."
exit 1
fi

# Get the repository name (extract from remote URL)
repo_name=$(basename -s .git "$(git config --get remote.origin.url 2>/dev/null)")

# Get the current branch name (if on a branch)
branch_name=$(git symbolic-ref --short HEAD 2>/dev/null || echo "N/A")

# Get the latest commit hash
commit_hash=$(git rev-parse HEAD 2>/dev/null)

# Output results
echo $repo_name-$branch_name-$commit_hash
2 changes: 2 additions & 0 deletions config/day_cluster/prod_cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,8 @@ Tags: # TAGs necessary for per-user/project/job cost tracking
Value: REGSUB_CLUSTER_NAME
- Key: aws-parallelcluster-enforce-budget
Value: REGSUB_ENFORCE_BUDGET
- Key: aws-parallelcluster-daylily-git-deets
Value: REGSUB_DAYLILY_GIT_DEETS
DevSettings:
Timeouts:
HeadNodeBootstrapTimeout: 3600
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

setup(
name="daylily",
version="0.7.164",
version="0.7.166",
packages=find_packages(),
install_requires=[
# Add dependencies here
Expand Down

0 comments on commit ce515b6

Please sign in to comment.