Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle cases where the WDL workflow output contains exprs #201

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 13 additions & 5 deletions wdl2cwl/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -727,7 +727,11 @@ def get_workflow_outputs(
with WDLSourceLine(item.info, ConversionException):
output_name = item.name
item_expr = item.info.expr
output_source = item_expr.expr.name[::-1].replace(".", "/", 1)[::-1]
if isinstance(item_expr, WDL.Expr.Get):
output_source = item_expr.expr.name[::-1].replace(".", "/", 1)[::-1]
else:
output_source = f"$({self.get_expr(item_expr)})"

# replace just the last occurrence of a period with a slash
# by first reversing the string and the replace the first occurrence
# then reversing the result
Expand Down Expand Up @@ -1024,16 +1028,20 @@ def get_expr_get(self, wdl_get_expr: WDL.Expr.Get) -> str:

def get_expr_ident(self, wdl_ident_expr: WDL.Expr.Ident) -> str:
"""Translate WDL Ident Expressions."""
id_name = wdl_ident_expr.name
id_name: str = wdl_ident_expr.name
referee = wdl_ident_expr.referee
optional = wdl_ident_expr.type.optional
if referee:
with WDLSourceLine(referee, ConversionException):
if isinstance(referee, WDL.Tree.Call):
return id_name
if referee.expr and (
wdl_ident_expr.name in self.optional_cwl_null
or wdl_ident_expr.name not in self.non_static_values
if (
hasattr(referee, "expr")
and referee.expr is not None
and (
wdl_ident_expr.name in self.optional_cwl_null
or wdl_ident_expr.name not in self.non_static_values
)
):
return self.get_expr(referee.expr)
ident_name = get_input(id_name)
Expand Down
335 changes: 335 additions & 0 deletions wdl2cwl/tests/cwl_files/blast.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,335 @@
cwlVersion: v1.2
$graph:
- class: Workflow
id: main
inputs:
- id: blast_docker_override
type:
- string
- 'null'
- id: blast_docker
type: string
# TODO: how to translate the original?
# String blast_docker = select_first([blast_docker_override,"swr.cn-south-1.myhuaweicloud.com/cngbdb/blast:1.2"])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the general case of programmatic conversion, we would need to synthesize a new step to compute the usable value of blast_docker and use the output of that step everywhere that originally referred to the blast_docker input.

However, if I was converting this by hand I would have combined blast_docker_override and blast_docker into a single CWL input with a default value. However, this is being used to override the docker image name with a dynamic value, which is not yet supported in CWL; so it would be useless :-P

Anyhow, we still need a general way to convert WDL expressions in the definition of inputs. I recommend creating a separate issue for that; modifying this WDL to not need workflow input declaration expressions, and focusing on synthesizing new steps for workflow output expressions.

- id: queryfa
type: File
- id: fname
type: string
default: '/sfs/blastdb/2019_ncov/nucl/v6/ncov'
- id: method
type: string
default: 'blastn'
- id: outfmt
type: int
default: 7
- id: evalue
type: float
default: 10
# TODO: how to translate the original?
# String Outfile = basename(queryfa)+'.blast_result.txt'
- id: Outfile
type: string
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another WDL expression in the definition of an input; I've made an issue for that #240

# runblastn task inputs, to be linked from workflow -> tool
- id: runblastn.docker
type: string
- id: runblastn.Queryfa
type: File
- id: runblastn.Fname
type: string
- id: runblastn.max_target_seqs
default: 100
type: int
- id: runblastn.word_size
default: 28
type: int
- id: runblastn.reward
default: 1
type: int
- id: runblastn.penalty
default: -2
type: int
- id: runblastn.strand
default: both
type: string
- id: runblastn.gapopen
default: 0
type: int
- id: runblastn.gapextend
default: 0
type: int
- id: runblastn.dust
default: "'20 64 1'"
type: string
- id: runblastn.max_hsps
type:
- int
- 'null'
- id: runblastn.tasks
default: megablast
type: string
- id: runblastn.taxids
type:
- string
- 'null'
- id: runblastn.negative_taxids
type:
- string
- 'null'
- id: runblastn.lcase_masking
default: false
type: boolean
# runblastp task inputs, to be linked from workflow -> tool
- id: runblastp.docker
type: string
- id: runblastp.Queryfa
type: File
- id: runblastp.Fname
type: string
- id: runblastp.max_target_seqs
default: 100
type: int
- id: runblastp.word_size
default: 6
type: int
- id: runblastp.seg
default: no
type: string
- id: runblastp.comp_based_stats
default: '2'
type: string
- id: runblastp.matrix
default: BLOSUM62
type: string
- id: runblastp.gapopen
default: 11
type: int
- id: runblastp.gapextend
default: 1
type: int
- id: runblastp.max_hsps
type:
- int
- 'null'
- id: runblastp.taxids
type:
- string
- 'null'
- id: runblastp.negative_taxids
type:
- string
- 'null'
- id: runblastp.lcase_masking
default: false
type: boolean
steps:
- id: runblastp
in:
Fname: fname
Queryfa: queryfa
docker: blast_docker
outfmt: outfmt
Outfile: Outfile
threads: threads
run: runblastp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run: runblastp
run: #runblastp

when: $(inputs.method == 'blastp')
out: [out]
- id: runblastn
in:
Fname: fname
Queryfa: queryfa
docker: blast_docker
outfmt: outfmt
evalue: evalue
Outfile: Outfile
threads: threads
run: runblastn
when: $(inputs.method == 'blastn')
out: [out]
outputs:
- id: fina_output
type: File
outputSource:
- runblastp/out
- runblastn/out
pickValue: first_non_null

- class: CommandLineTool
id: runblastn
requirements:
- class: InitialWorkDirRequirement
listing:
- entryname: script.bash
entry: |4
set -e
blastn -db "$(inputs.Fname)" \
-show_gis \
-query $(inputs.QueryFa) \
-outfmt $(inputs.outfmt) \
-out $(inputs.Outfile) \
-max_target_seqs $(inputs.max_target_seqs) \
-evalue $(inputs.evalue) \
-word_size $(inputs.word_size) \
-penalty $(inputs.penalty) \
-reward $(inputs.reward) \
-dust $(inputs.dust) \
-gapopen $(inputs.gapopen) \
-gapextend $(inputs.gapextend) \
-task $(inputs.tasks) \
-strand $(inputs.strand) \
-num_threads $(inputs.threads) \
$(true='-lcase_masking' false='' inputs.lcase_masking) $("-max_hsps "+ inputs.max_hsps) $("-taxids " +inputs.taxids) $("-negative_taxids " +inputs.negative_taxids)\
hints:
- class: DockerRequirement
- class: ResourceRequirement
coresMin: 8
ramMin: 16384
inputs:
- id: docker
type: string
- id: Queryfa
type: File
- id: Fname
type: string
- id: Outfile
type: string
- id: threads
type: int
- id: outfmt
type: int
- id: max_target_seqs
default: 100
type: int
- id: word_size
default: 28
type: int
- id: reward
default: 1
type: int
- id: penalty
default: -2
type: int
- id: strand
default: both
type: string
- id: gapopen
default: 0
type: int
- id: gapextend
default: 0
type: int
- id: dust
default: "'20 64 1'"
type: string
- id: max_hsps
type:
- int
- 'null'
- id: tasks
default: megablast
type: string
- id: taxids
type:
- string
- 'null'
- id: negative_taxids
type:
- string
- 'null'
- id: lcase_masking
default: false
type: boolean
baseCommand:
- bash
- script.bash
outputs:
- id: out
type: File
outputBinding:
glob: $(inputs.Outfile)

- class: CommandLineTool
id: runblastp
requirements:
- class: InitialWorkDirRequirement
listing:
- entryname: script.bash
entry: |4
set -e
blastp -db "$(inputs.Fname)" \
-query $(inputs.Queryfa) \
-outfmt $(inputs.outfmt) \
-out $(inputs.Outfile) \
-max_target_seqs $(inputs.max_target_seqs) \
-comp_based_stats $(inputs.comp_based_stats) \
-evalue $(inputs.evalue) \
-word_size $(inputs.word_size) \
-matrix $(inputs.matrix) \
-seg $(inputs.seg) \
-gapopen $(inputs.gapopen) \
-gapextend $(inputs.gapextend) \
-num_threads $(inputs.threads) \
$(true='-lcase_masking' false='' inputs.lcase_masking) $("-max_hsps "+inputs.max_hsps) $("-taxids " +inputs.taxids) $("-negative_taxids " +inputs.negative_taxids) \
hints:
- class: DockerRequirement
- class: ResourceRequirement
coresMin: 8
ramMin: 16384
inputs:
- id: docker
type: string
- id: Queryfa
type: File
- id: Fname
type: string
- id: outfmt
type: string
- id: Outfile
type: string
- id: evalue
type: float
- id: threads
type: int
- id: max_target_seqs
default: 100
type: int
- id: word_size
default: 6
type: int
- id: seg
default: no
type: string
- id: comp_based_stats
default: '2'
type: string
- id: matrix
default: BLOSUM62
type: string
- id: gapopen
default: 11
type: int
- id: gapextend
default: 1
type: int
- id: max_hsps
type:
- int
- 'null'
- id: taxids
type:
- string
- 'null'
- id: negative_taxids
type:
- string
- 'null'
- id: lcase_masking
default: false
type: boolean
baseCommand:
- bash
- script.bash
outputs:
- id: out
type: File
outputBinding:
glob: $(inputs.Outfile)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mr-c , I tried to manually partially translate the blast.wdl file. Could you take a look at the file above and see if it's going in the right direction here, please?

There are a couple TODO's where I am not sure how to translate something from WDL, 🙏

I used two tasks only, translating them both into CommandLineTool's. Also used that $graph trick to pack multiple processes in the same file. Finally, added Workflow named main for the blast.wdl workflow definition. The Workflow uses conditional to pick the first non null (i.e. select_first and if's from WDL).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great approach! Do we have test data so we can compare the WDL and CWL versions?

1 change: 1 addition & 0 deletions wdl2cwl/tests/test_cwl.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ def test_meta(caplog: pytest.LogCaptureFixture) -> None:
("vt.wdl"),
("whatshap.wdl"),
("workflow_inputs.wdl"),
("blast.wdl"),
],
)
def test_wdls(description_name: str) -> None:
Expand Down
Loading
Loading