Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSPS-183 Beagle imputation hg38 wdl and associated support wdls #1333

Open
wants to merge 94 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 90 commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
a438df8
wip add beagle imputation stuff
mmorgantaylor Feb 28, 2024
737599f
add 2 wdls to dockstore.yml
mmorgantaylor Feb 28, 2024
134150d
fix docker gar url
mmorgantaylor Feb 28, 2024
a307f93
use the right path for jars
mmorgantaylor Feb 28, 2024
658d5b6
wip on imputation wdl
mmorgantaylor Feb 28, 2024
8e479e0
oops use correct jar
mmorgantaylor Feb 28, 2024
10ab331
missing equals
mmorgantaylor Feb 28, 2024
1e1bebb
fix java call again
mmorgantaylor Feb 28, 2024
6b4d26c
fix java call
mmorgantaylor Feb 29, 2024
d96c9bb
oops match file names
mmorgantaylor Feb 29, 2024
6a2a586
update beagle jar to 01Mar24.d36
mmorgantaylor Mar 1, 2024
71b49c4
debug GatherVcfs
mmorgantaylor Mar 2, 2024
406e83e
debug GatherVcfs 2
mmorgantaylor Mar 2, 2024
0f049ba
try to resolve missing file issue
mmorgantaylor Mar 2, 2024
4c42428
don't impute over padding
mmorgantaylor Mar 4, 2024
f92f2e6
make the index again
mmorgantaylor Mar 4, 2024
3d7162c
supply vcf_index input to SelectVariantsByIds
mmorgantaylor Mar 8, 2024
9f6624e
update Imputation wdl too
mmorgantaylor Mar 15, 2024
ea6ab6d
newlines
mmorgantaylor Mar 15, 2024
49c7987
update for hg38
mmorgantaylor Mar 19, 2024
c3d1e81
Revert "update for hg38"
mmorgantaylor Mar 19, 2024
13fe9c2
update for hg38
mmorgantaylor Mar 19, 2024
5d239fe
liftover wdl
mmorgantaylor Mar 20, 2024
d13ab27
remove GCP-specific vm commands
mmorgantaylor Mar 20, 2024
55fe320
use gatk
mmorgantaylor Mar 21, 2024
29d965c
fix suffix and basename
mmorgantaylor Mar 21, 2024
fa2ca59
fix more filenames
mmorgantaylor Mar 21, 2024
ec1602b
remove missing contig stuff for now
mmorgantaylor Mar 21, 2024
6ba6d03
fix ref panel path
mmorgantaylor Mar 22, 2024
c4575a7
another chr fix
mmorgantaylor Mar 22, 2024
f206448
warn on missign contig
mmorgantaylor Mar 27, 2024
517719d
do fail if missing contig
mmorgantaylor Mar 27, 2024
0f52083
more mem
mmorgantaylor Mar 27, 2024
2c6643c
troubleshooting wld
mmorgantaylor Mar 27, 2024
554ad06
fixed plink path
mmorgantaylor Mar 28, 2024
d7d07e9
add select_first test
mmorgantaylor Mar 28, 2024
ef0c5a0
cleanup
mmorgantaylor Mar 28, 2024
60fedf5
add if block to test
mmorgantaylor Mar 28, 2024
201d0b2
create and use ref panel interval list
mmorgantaylor Apr 2, 2024
895a6c6
move interval list creation to ref panel wdl
mmorgantaylor Apr 3, 2024
ccb2f3c
give default values for optional inputs, weird
mmorgantaylor Apr 3, 2024
b3b229a
change CountVariants calls
mmorgantaylor Apr 3, 2024
e012dda
test
mmorgantaylor Apr 4, 2024
99f90e2
add output to test
mmorgantaylor Apr 4, 2024
e6a5b05
next test
mmorgantaylor Apr 4, 2024
3d44349
more test
mmorgantaylor Apr 4, 2024
3ef57a8
another test
mmorgantaylor Apr 4, 2024
56c6469
update real task
mmorgantaylor Apr 4, 2024
20fe393
TSPS-226 presplit and prechunk beagle inputs (#1272)
jsotobroad Apr 27, 2024
cd6134e
TSPS-221 remove index input and add seed to make beagle tool determin…
jsotobroad May 20, 2024
b063f2b
rename workflow
mmorgantaylor May 21, 2024
5906ce1
TSPS-241 Clean up beagle wdl (#1288)
jsotobroad Jun 3, 2024
8372d3b
add specific gatk_docker
mmorgantaylor Jun 6, 2024
827bc42
TSPS-142 updates to help creating simulated reference panel and runni…
jsotobroad Jun 10, 2024
b85a703
add maxRetries 2 to all imputation beagle tasks
mmorgantaylor Jun 28, 2024
e64be57
add prechunk wdl to dockstore
mmorgantaylor Jul 8, 2024
dede6ce
use acr for default ubuntu image
mmorgantaylor Jul 11, 2024
6d94ca7
add preemptible 3
mmorgantaylor Jul 12, 2024
8c43b1b
use acr gatk docker as default
mmorgantaylor Jul 12, 2024
c646241
don't use preemptibles on GatherVcfs
mmorgantaylor Jul 15, 2024
902969b
basename fix for imputation beagle ref panel generation (#1332)
mmorgantaylor Jul 15, 2024
4130623
TSPS-269 Speed up CountVariantsInChunksBeagle by using bedtools (#1335)
mmorgantaylor Jul 19, 2024
4b7fcfe
update pipeline version to 0.0.2
mmorgantaylor Jul 26, 2024
51adbd9
TSPS-293: Fix up streaming imputation beagle (#1347)
mmorgantaylor Aug 5, 2024
f72a46c
add array imputation quota consumed wdl (#1425)
jsotobroad Nov 13, 2024
144aed5
TSPS-239 get wdl running on 400k sample ref panel (#1373)
jsotobroad Nov 21, 2024
05fe604
add set -e -o pipefail to all relevant imputation tasks (#1434)
jsotobroad Nov 21, 2024
ae38109
TSPS-341 remove tasks for recovering variants not in the reference pa…
jsotobroad Jan 9, 2025
eb79037
Updated pipeline_versions.txt with all pipeline version information
actions-user Feb 13, 2025
39fb410
[PR to feature branch] Add testing to imputation beagle (#1503)
mmorgantaylor Feb 24, 2025
bd75c0c
Updated pipeline_versions.txt with all pipeline version information
actions-user Feb 24, 2025
8aa7f54
remove newline at end of Utilities.wdl
mmorgantaylor Feb 24, 2025
b98463a
remove LiftoverVcfs, add README for imputation_beagle
mmorgantaylor Feb 24, 2025
0d21384
oops this commit adds the README for imputation_beagle
mmorgantaylor Feb 24, 2025
38121ac
rename test inputs files to reflect contents
mmorgantaylor Feb 24, 2025
1135a08
PR comments round 1
mmorgantaylor Feb 24, 2025
bab5a2a
Updated pipeline_versions.txt with all pipeline version information
actions-user Feb 24, 2025
4d97dc4
update changelog for BroadInternalImputation
mmorgantaylor Feb 24, 2025
b30a1db
Updated pipeline_versions.txt with all pipeline version information
actions-user Feb 24, 2025
64d66ac
add back newline to Utilities.wdl with -w flag on changed file check
mmorgantaylor Feb 24, 2025
3e81fed
remove change to Minimac4 task
mmorgantaylor Feb 24, 2025
295a3ad
revert change to tool command in OptionalQCSites
mmorgantaylor Feb 24, 2025
408e9e2
fix fail task dependency, revert attempt to ignore newline in diff, o…
mmorgantaylor Feb 25, 2025
1546ad7
update README for ImputationBeagle
mmorgantaylor Feb 25, 2025
dbb059f
rename test files
mmorgantaylor Feb 26, 2025
d980d00
Updated pipeline_versions.txt with all pipeline version information
actions-user Feb 26, 2025
06b3d1f
another commit for hashes
mmorgantaylor Feb 26, 2025
0d60dec
Merge branch 'develop' into TSPS-183_mma_beagle_imputation_hg38
nikellepetrillo Feb 26, 2025
fd78fd3
Updated pipeline_versions.txt with all pipeline version information
actions-user Feb 26, 2025
e795792
dummy commit
nikellepetrillo Feb 26, 2025
52784ef
Merge branch 'develop' into TSPS-183_mma_beagle_imputation_hg38
nikellepetrillo Feb 26, 2025
1fd999f
pr comments
mmorgantaylor Feb 26, 2025
b216840
Updated pipeline_versions.txt with all pipeline version information
actions-user Feb 26, 2025
43b48ed
dummy commit
mmorgantaylor Feb 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,14 @@ workflows:
subclass: WDL
primaryDescriptorPath: /pipelines/broad/arrays/imputation/Imputation.wdl

- name: ImputationBeagle
subclass: WDL
primaryDescriptorPath: /pipelines/broad/arrays/imputation_beagle/ImputationBeagle.wdl

- name: ArrayImputationQuotaConsumed
subclass: WDL
primaryDescriptorPath: /pipelines/broad/arrays/imputation_beagle/ArrayImputationQuotaConsumed.wdl

- name: RNAWithUMIsPipeline
subclass: WDL
primaryDescriptorPath: /pipelines/broad/rna_seq/RNAWithUMIsPipeline.wdl
Expand Down Expand Up @@ -155,6 +163,10 @@ workflows:
subclass: WDL
primaryDescriptorPath: /verification/test-wdls/TestImputation.wdl

- name: TestImputationBeagle
subclass: WDL
primaryDescriptorPath: /verification/test-wdls/TestImputationBeagle.wdl

- name: TestJointGenotyping
subclass: WDL
primaryDescriptorPath: /verification/test-wdls/TestJointGenotyping.wdl
Expand Down
75 changes: 75 additions & 0 deletions .github/workflows/test_imputation_beagle.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
name: Test ImputationBeagle

# Controls when the workflow will run
on:
pull_request:
branches: [ "develop", "staging", "master" ]
# Only run if files in these paths changed:
####################################
# SET PIPELINE SPECIFIC PATHS HERE #
####################################
paths:
- 'pipelines/broad/arrays/imputation_beagle/**'
- 'structs/imputation/ImputationBeagleStructs.wdl'
- 'tasks/broad/ImputationTasks.wdl'
- 'tasks/broad/ImputationBeagleTasks.wdl'
- 'verification/VerifyImputationBeagle.wdl'
- 'verification/test-wdls/TestImputationBeagle.wdl'
- 'tasks/broad/Utilities.wdl'
- 'tasks/broad/TerraCopyFilesFromCloudToCloud.wdl'
- '.github/workflows/test_imputation_beagle.yml'
- '.github/workflows/warp_test_workflow.yml'


# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
inputs:
useCallCache:
description: 'Use call cache (default: true)'
required: false
default: "true"
updateTruth:
description: 'Update truth files (default: false)'
required: false
default: "false"
testType:
description: 'Specify the type of test (Plumbing or Scientific)'
required: false
type: choice
options:
- Plumbing
- Scientific
truthBranch:
description: 'Specify the branch for truth files (default: master)'
required: false
default: "master"

env:
# pipeline configuration
PIPELINE_NAME: TestImputationBeagle
DOCKSTORE_PIPELINE_NAME: ImputationBeagle
PIPELINE_DIR: "pipelines/broad/arrays/imputation_beagle"

# workspace configuration
TESTING_WORKSPACE: WARP Tests
WORKSPACE_NAMESPACE: warp-pipelines

# service account configuration
SA_JSON_B64: ${{ secrets.PDT_TESTER_SA_B64 }}
USER: [email protected]


jobs:
TestImputationBeagle:
uses: ./.github/workflows/warp_test_workflow.yml
with:
pipeline_name: TestImputationBeagle
dockstore_pipeline_name: ImputationBeagle
pipeline_dir: pipelines/broad/arrays/imputation_beagle
use_call_cache: ${{ github.event.inputs.useCallCache || 'true' }}
update_truth: ${{ github.event.inputs.updateTruth || 'false' }}
test_type: ${{ github.event.inputs.testType }}
truth_branch: ${{ github.event.inputs.truthBranch || 'master' }}
secrets:
PDT_TESTER_SA_B64: ${{ secrets.PDT_TESTER_SA_B64 }}
DOCKSTORE_TOKEN: ${{ secrets.DOCKSTORE_TOKEN }}
1 change: 1 addition & 0 deletions .github/workflows/warp_test_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,7 @@ jobs:

echo "Starting hash comparison with retry mechanism..."


while [ $TOTAL_WAITED -lt $MAX_WAIT_TIME ]; do
echo "Fetching Dockstore Commit Hash..."
DOCKSTORE_COMMIT_HASH=$(python scripts/dockstore_api/fetch_dockstore_commit.py \
Expand Down
68 changes: 35 additions & 33 deletions pipeline_versions.txt
Original file line number Diff line number Diff line change
@@ -1,40 +1,42 @@
Pipeline Name Version Date of Last Commit
snm3C 4.0.4 2024-08-06
BuildIndices 4.0.0 2025-01-17
scATAC 1.3.2 2023-08-03
MultiSampleSmartSeq2SingleNucleus 2.0.8 2025-02-25
atac 2.7.1 2025-02-25
Optimus 7.9.2 2025-02-25
SmartSeq2SingleSample 5.1.21 2024-09-11
Multiome 5.11.0 2025-02-25
PairedTag 1.10.2 2025-02-25
SlideSeq 3.4.9 2025-02-25
MultiSampleSmartSeq2 2.2.22 2024-09-11
AnnotationFiltration 1.2.7 2024-11-04
RNAWithUMIsPipeline 1.0.18 2024-11-04
Imputation 1.1.15 2024-11-04
Arrays 2.6.30 2024-11-04
MultiSampleArrays 1.6.2 2024-08-02
ValidateChip 1.16.7 2024-11-04
JointGenotyping 1.7.2 2024-11-04
ArrayImputationQuotaConsumed 1.0.0 2025-02-24
ImputationBeagle 1.0.0 2025-02-24
Imputation 1.1.16 2025-02-24
MultiSampleArrays 1.6.2 2024-08-02
WholeGenomeReprocessing 3.3.3 2024-11-04
ExomeReprocessing 3.3.3 2024-11-04
CramToUnmappedBams 1.1.3 2024-08-02
ExternalWholeGenomeReprocessing 2.3.3 2024-11-04
ExternalExomeReprocessing 3.3.3 2024-11-04
BroadInternalArrays 1.1.14 2024-11-04
BroadInternalImputation 1.1.15 2025-02-24
BroadInternalRNAWithUMIs 1.0.36 2024-11-04
BroadInternalUltimaGenomics 1.1.3 2024-12-05
RNAWithUMIsPipeline 1.0.18 2024-11-04
IlluminaGenotypingArray 1.12.24 2024-11-04
AnnotationFiltration 1.2.7 2024-11-04
UltimaGenomicsWholeGenomeCramOnly 1.0.23 2024-11-04
GDCWholeGenomeSomaticSingleSample 1.3.4 2024-11-04
UltimaGenomicsWholeGenomeGermline 1.1.3 2024-12-05
WholeGenomeGermlineSingleSample 3.3.3 2024-11-04
ExomeGermlineSingleSample 3.2.3 2024-11-04
VariantCalling 2.2.4 2024-11-04
ReblockGVCF 2.4.0 2024-12-05
UltimaGenomicsJointGenotyping 1.2.2 2024-11-04
JointGenotypingByChromosomePartTwo 1.5.2 2024-11-04
JointGenotypingByChromosomePartOne 1.5.2 2024-11-04
ExomeGermlineSingleSample 3.2.3 2024-11-04
WholeGenomeGermlineSingleSample 3.3.3 2024-11-04
UltimaGenomicsWholeGenomeGermline 1.1.3 2024-12-05
VariantCalling 2.2.4 2024-11-04
GDCWholeGenomeSomaticSingleSample 1.3.4 2024-11-04
UltimaGenomicsWholeGenomeCramOnly 1.0.23 2024-11-04
JointGenotypingByChromosomePartTwo 1.5.2 2024-11-04
JointGenotyping 1.7.2 2024-11-04
CheckFingerprint 1.0.22 2024-10-28
BroadInternalRNAWithUMIs 1.0.36 2024-11-04
BroadInternalImputation 1.1.14 2024-11-04
BroadInternalArrays 1.1.14 2024-11-04
BroadInternalUltimaGenomics 1.1.3 2024-12-05
IlluminaGenotypingArray 1.12.24 2024-11-04
ExternalExomeReprocessing 3.3.3 2024-11-04
ExternalWholeGenomeReprocessing 2.3.3 2024-11-04
ExomeReprocessing 3.3.3 2024-11-04
CramToUnmappedBams 1.1.3 2024-08-02
WholeGenomeReprocessing 3.3.3 2024-11-04
scATAC 1.3.2 2023-08-03
MultiSampleSmartSeq2 2.2.22 2024-09-11
BuildIndices 4.0.0 2025-01-17
SlideSeq 3.4.9 2025-02-25
PairedTag 1.10.2 2025-02-25
MultiSampleSmartSeq2SingleNucleus 2.0.8 2025-02-25
atac 2.7.1 2025-02-25
snm3C 4.0.4 2024-08-06
SmartSeq2SingleSample 5.1.21 2024-09-11
Optimus 7.9.2 2025-02-25
Multiome 5.11.0 2025-02-25
5 changes: 5 additions & 0 deletions pipelines/broad/arrays/imputation/Imputation.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 1.1.16
2025-02-24 (Date of Last Commit)

* Updated runtime parameters in some ImputationTasks, and added an explicit definition of a vcf_index.

# 1.1.15
2024-11-04 (Date of Last Commit)

Expand Down
3 changes: 2 additions & 1 deletion pipelines/broad/arrays/imputation/Imputation.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import "../../../../tasks/broad/Utilities.wdl" as utils

workflow Imputation {

String pipeline_version = "1.1.15"
String pipeline_version = "1.1.16"

input {
Int chunkLength = 25000000
Expand Down Expand Up @@ -242,6 +242,7 @@ workflow Imputation {
call tasks.SelectVariantsByIds {
input:
vcf = SetIdsVcfToImpute.output_vcf,
vcf_index = SetIdsVcfToImpute.output_vcf_index,
ids = FindSitesUniqueToFileTwoOnly.missing_sites,
basename = "imputed_sites_to_recover"
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# 1.0.0
2025-02-24 (Date of Last Commit)

* Initial release of pipeline to calculate the number of samples, i.e. quota used by an imputation service that uses ImputationBeagle.wdl.
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
version 1.0

import "../../../../tasks/broad/ImputationTasks.wdl" as tasks

workflow QuotaConsumed {
String pipeline_version = "1.0.0"

input {
Int chunkLength = 25000000
Int chunkOverlaps = 5000000

File multi_sample_vcf

File ref_dict
Array[String] contigs
String reference_panel_path_prefix
String genetic_maps_path
String output_basename
}

call tasks.CountSamples {
input:
vcf = multi_sample_vcf
}

output {
Int quota_consumed = CountSamples.nSamples
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# 1.0.0
2025-02-24 (Date of Last Commit)

* * Initial public release of the ImputationBeagle pipeline.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* * Initial public release of the ImputationBeagle pipeline.
* Initial public release of the ImputationBeagle pipeline.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

* The ImputationBeagle pipeline imputes missing genotypes from a multi-sample VCF using a large genomic reference panel. It is based on the Michigan Imputation Server pipeline but uses the Beagle imputation tool instead of minimac. Overall, the pipeline filters, phases, and performs imputation on a multi-sample VCF. It outputs the imputed VCF.
Loading
Loading