Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid downloading expression tool inputs and outputs to disk? #1666

Open
adamnovak opened this issue May 13, 2022 · 0 comments
Open

Avoid downloading expression tool inputs and outputs to disk? #1666

adamnovak opened this issue May 13, 2022 · 0 comments

Comments

@adamnovak
Copy link

I want to be able to run a tool like this, and let it output a File without having to have the data on the local disk:

#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: ExpressionTool
requirements:
  InlineJavascriptRequirement: {}
inputs:
  f1:
    type: File
outputs:
  f2:
    type: File
expression: |
  ${ return {f2: f1}; }

This tool never accesses the file data, and I'm not sure that an ExpressionTool is even allowed to go around opening files from JavaScript, so there's no reason the file should need to be downloaded beforehand. I think in vanilla cwltool this would apply when the location(s) are HTTP(S) URLs, but in Toil we pass around File and Directory objects with a variety of URL schemes in location. Also, an ExpressionTool might or might not be allowed to just build a {class: 'File'} itself. So it makes sense for File objects to come out of a tool that don't correspond to things that are on disk.

But, at the end of a tool, the executor wants to run relocateOutputs, and that in turn seems to demand that the files be on disk unless the scheme is the magic _: one for files to be created:

cwltool/cwltool/process.py

Lines 397 to 398 in 898fddf

elif not location.startswith("_:") and ":" in location:
ob["location"] = file_uri(fs_access.realpath(location))

It looks like it does this for checksumming, but checksumming might not be turned on, and the checksum might already be set.

Is there a way to get a hook in here so that Toil can let File objects through expression tools without downloading them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant