Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/fix aws invoke #120

Closed
wants to merge 11 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 0 additions & 9 deletions .bumpversion.cfg

This file was deleted.

30 changes: 30 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Release Packages

on:
release:
types: [released]

jobs:
publish-to-npm-registry:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3


- uses: actions/setup-node@v4
with:
node-version: '20.x'

- run: npm install

# build and package

- run: npm run build

- run: npm run package

- run: npx publib-npm
env:
NPM_TOKEN: ${{secrets.NPM_PUBLISH_TOKEN}}
NPM_ACCESS_LEVEL: public
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,12 @@ TODO.*md
NOTES.md
prev/
temp/
/.idea/

# Generated by jsii - safe to delete, and ideally should be in .gitignore
tsconfig.json

dist/

.jsii
*.tsbuildinfo
20 changes: 17 additions & 3 deletions .npmignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
*.ts
!*.d.ts

# CDK asset staging directory
.cdk.staging
cdk.out

# Exclude typescript source and config
*.ts
!*.d.ts
tsconfig.json
*.tsbuildinfo

# Include javascript files and typescript declarations
!*.js
!*.d.ts

# Exclude jsii outdir
dist

# Include .jsii and .jsii.gz
!.jsii
!.jsii.gz
123 changes: 2 additions & 121 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,122 +1,3 @@
# nextflow-stack
# Oncoanalyser AWS

An AWS stack for running Nextflow pipelines on Batch using shared resources.

🚧

Highlight key aspects

* Precise resource allocation requests under SPOT pricing model
* No EBS costs and local SSD discounted via SPOT pricing
* No duplication of Batch queues
* Fusion, Wave
* Improved resume experience compared to use of ephermal workdir disk
* Ability to highly optimise instances for individual processes

Future work

* Dynamic queue selection
* Retry on SPOT pre-emption X times
* BYO bucket
* CodePipeline for deployment
* Improve configuration and data handling

## Table of contents

🚧

## Deployment

🚧

* Detail application stacks that must be deployed aplus additional set up (i.e. ECR, Docker images)

### Development

> `deployment/development-stack.ts`

### CodePipeline CI/CD

> `deployment/codepipeline-stack.ts`

## Pipelines

🚧

### oncoanalyser

🚧

### star-align-nf

🚧

> `umccr/star-align-nf`

### UMCCR post-processing

🚧

#### Design

🚧

Diagram (avoid overlap with Overview diagram) including reference data etc

Section detailing current compromises

* GDS token access
* Migrating data from GDS to S3 for execution
* Not fully optimised for speed; show timeline or similar
* Many Docker images on DockerHub, ideally would be on ECR
* Must resolve 502 errors in Wave when pulling from grch.io or ECR
* Only broad control over processes run currently
* Passing run configuration by CLI args is somewhat clumsy
* Alternative: JSON on remote (S3, API call); could be extended to general config

Items that need to be addressed

* **Important**: Isofox takes an expected count file that is dependent of read length
* So we **must** be sure that we're using expected counts for the correct read length
* Application still tied to UMCCR VPC and other resources
* Repetition between pipeline stacks and associated code (run.sh, Dockerfile, etc)
* Cannot parallelise workflow stack deployment in CodePipeline with waves in current set up
* Extra arguments for `run.sh` are ignored, error should be raised
* Job cancellation is difficult when the pipeline crashes
* I have observed a rare issue with unexpected Fusion shutdown that interrupts processing
* Staging data from GDS to S3 suffers significant slow down after ~one hour (bursting related?)
* Could spin out multiple instances or high capacity instance improve transfer speed

Other notes

* Nextflow config `nextflow_aws.config` could be split into processes and input
* Include other parameters in config: HMF refdata path, VBE path, genome version, workdir
* To discuss staged data location, sorting, structure, retention
* Lifecycle of data in Nextflow S3 workdir; single workdir or per run/sample/etc

#### Usage

🚧

Diagram describing common run modes with correponding commands

Run modes (relative to CUPPA)

* WGTS
* WGS only
* WTS only
* WGTS with existing WGS
* WGTS with existing WTS
* WGTS with existing WGS and WTS

> include run resuming and use of this as an alternative to providing existing data

> note how to run any individual process/stage with the appropriate inputs

#### Notes

🚧

Other important items to note

* Fusion usually gives much better performance but not always
A CDK package that deploys Oncoanalyser.
133 changes: 0 additions & 133 deletions application/application-stack.ts

This file was deleted.

Loading