Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update iChor WDLs for WDL 1.0 #77

Merged
merged 4 commits into from
Mar 28, 2025

Conversation

geoffjentry
Copy link
Contributor

@geoffjentry geoffjentry commented Jan 6, 2025

This PR does a couple of things:

  1. Updates to WDL 1.0, including necessary syntactical changes
  2. Removes the use of optional inputs when a default is provided
  3. Minor formatting adjustments

The removal of the optionals was useful because of constructs like disks: "local-disk " + diskGB + " HDD" which is not technically valid (e.g. what if diskGB is None?) although Cromwell does allow it. It also removed some warnings from miniwdl check listing.

I don't have access to the docker containers at the moment so can't run this to verify the changes are valid. They shouldn't be modifying any behavior, but it'd be good to run this first before merging.

EDIT NB: While this has been open we merged another commit, so now this is doing a bit more. Specifically (from the other PR description):

  • The majority of line changes are changes to spacing.
    • Pretty-format input JSON for readability
    • Changes to WDL indents
    • Changes to python spacing suggested by the tool ruff format
  • Changes suggested by shellcheck
    • Files should be quoted by strings to avoid breaking at white space
    • Lists of files are best handled in arrays to prevent splitting
  • General WDL best-practices
    • prefer ~{var_name} to ${var_name}
    • prefer "~{foo}-bar" to foo + "-bar"
    • set -euxo pipefail at the beginning of tasks so that errors cause task failures rather than returning nonsense
    • Replace optional input and select_first with default values when they are known in advance.
  • General python best-practices
    • Use context manager to open files so that they are auto-closed, always specify mode even if it's 'r'.

Float maxFracCNASubclone
Float maxFracGenomeSubclone
input {
File bam_file
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These weren't modified, they're just tabbed over one level

String sample_id
Int bin_size

Int min_qual = 20
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, the optionals with a default were converted to standard types

Float altFracThreshold
Int lambdaScaleHyperParam
Int rmCentromereFlankLength
String plotFileType = "pdf"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one other example of the de-optioning


call bundle_plots {
input: CNA_plots = CNA_plots,
summary_script = summary_script,
sample_set_name = sample_set_name
}
output {
bundle_plots.*
File bundledPDF = bundle_plots.bundledPDF
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's only this one output from bundle_plots, so I just kept the same name from the task

}

command <<<
set -euxo pipefail
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The set additions aren't formatting. However there does not to appear to be any situation in this WDL where it'd not be indicated.

--estimateScPrevalence ~{estimateClonality} \
--scStates "~{scStates}" \
--centromere ~{centromere} \
~{"--exons.bed '" + exons + "'"} \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better handling for the optional exons

Copy link
Collaborator

@MicahR-Y MicahR-Y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geoffjentry Added a few suggested edits, but overall it looks good! Have you tested running this WDL on Terra?

"ichorCNA.mapWig": "gs://gptag/ichorCNA_resources/map_hg19_1000kb.wig",
"ichorCNA.maxFracCNASubclone": "${0.7}",
"ichorCNA.maxFracGenomeSubclone": "${0.5}",
"ichorCNA.normalPanel": "gs://gptag/ichorCNA_resources/Stover_Lennon_UH2_20HD_PON_1X_ULP_median.rds",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since future samples will all be aligned on DRAGEN, I'd recommend using the PoN generated from DRAGEN-aligned samples: gs://gptag/ichorCNA_resources/Stover_Lennon_UH2_20HD_PON_1X_ULP_median.dragen_aln.rds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MicahR-Y For both this & the switch to Float - the intent here was to not change actual behavior in this PR. There'll be follow on PRs to change behavior (e.g. the next one is already baking in the Float update)

bam_index = bam_index,
sample_id = sample_id,
bin_size = bin_size,
chrs = chr_counter,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this throws an error, but I believe the comma after the last input in each of these input blocks is unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this is actually a bug in miniwdl. We'll update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 I misread. I thought this was an input block not a call block, and thus was mystified how it was working. This is fine.

String subclone_fraction = read_string("subclone_fraction")
String fraction_genome_subclonal = read_string("fraction_genome_subclonal")
String fraction_cna_subclonal = read_string("fraction_cna_subclonal")
String gc_map_correction_mad = read_string("gc-map_correction_mad")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed gc_map_correction_mad can be read in as a Float with read_float

@geoffjentry
Copy link
Contributor Author

@MicahR-Y re the three individual comments: looking to make the 2 changes to future PRs (one was already planned). And we can remove those commas.

We're not able to run this locally as we don't yet have access to the PON/WIG/etc files and I would have thought on Terra w/ the dock container having been private & us not having access to those buckets that'd also be an issue. The actual script changes (vs WDL syntax) were tested locally.

This raises another point which is the hg19 vs hg38. It looks like for OC we do need to keep support for both references. But the only delta is in the inputs JSON, not the WDL. Can also cross that bridge in a future PR.

@geoffjentry
Copy link
Contributor Author

@MicahR-Y So @TedBrookings pointed out I had been misreading the WDL surrounding your comment on the commas. It's fine.

"~{sample_id}.cna.seg" \
"~{sample_id}.seg.txt" \
"~{sample_id}.seg" \
"~{sample_id}.optimalSolution/"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MicahR-Y this was an issue in the original PR. I'm going to rerun w/ my rebased version just to make sure there wasn't a merge conflict.

Copy link
Collaborator

@MicahR-Y MicahR-Y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything I asked about has been or will be addressed, looks good to me!

@MicahR-Y MicahR-Y merged commit 27672b3 into broadinstitute:master Mar 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants