Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cached cwl.output.json contains docker file path references #1573

Open
tschoonj opened this issue Dec 7, 2021 · 1 comment
Open

cached cwl.output.json contains docker file path references #1573

tschoonj opened this issue Dec 7, 2021 · 1 comment

Comments

@tschoonj
Copy link

tschoonj commented Dec 7, 2021

Hi all,

I am experimenting with cwl.output.json to get the results back from a CommandLineTool that executes in a Docker environment. This works fine, but there appears to be a problem when re-running the same workflow: cwltool correctly recognizes that the cache can be used, but it chokes on the filepath that was saved into the cwl.output.json file which contains a path to a random generated folder that was used during the first run.

Expected Behavior

Caching should work fine, as expected

Actual Behavior

When retrying, I get the following error:

cwltool --outdir output --cachedir cache spike.cwl spike.yaml
INFO /usr/local/miniforge3/bin/cwltool 3.1.20211107152837
INFO Resolved 'spike.cwl' to 'file:///home/tom/gitlab/cwl-workflows/workflows/spike.cwl'
spike.cwl:8:3: Warning: checking item
                      Warning:   Field `class` contains undefined reference to
                      `http://commonwl.org/cwltool#Secrets`
INFO spike.cwl:8:3: Unknown hint http://commonwl.org/cwltool#Secrets
INFO [workflow ] start
INFO [workflow ] starting step arv_get
INFO [step arv_get] start
INFO [job arv_get] Using cached output in /home/tom/gitlab/cwl-workflows/workflows/cache/d04184a5b32119f4058d7e8fbc6ff511
ERROR Workflow error, try again with --debug for more information:
Output file path /cByOZc/ubuntu.sif must be within designated output directory (/nnKPqR) or an input file pass through.

The initial run produced:

INFO /usr/local/miniforge3/bin/cwltool 3.1.20211107152837
INFO Resolved 'spike.cwl' to 'file:///home/tom/gitlab/cwl-workflows/workflows/spike.cwl'
spike.cwl:8:3: Warning: checking item
                      Warning:   Field `class` contains undefined reference to
                      `http://commonwl.org/cwltool#Secrets`
INFO spike.cwl:8:3: Unknown hint http://commonwl.org/cwltool#Secrets
INFO [workflow ] start
INFO [workflow ] starting step arv_get
INFO [step arv_get] start
INFO [job arv_get] Output of job will be cached in /home/tom/gitlab/cwl-workflows/workflows/cache/d04184a5b32119f4058d7e8fbc6ff511
INFO [job arv_get] /home/tom/gitlab/cwl-workflows/workflows/cache/d04184a5b32119f4058d7e8fbc6ff511$ docker \
    run \
    -i \
    --mount=type=bind,source=/home/tom/gitlab/cwl-workflows/workflows/cache/d04184a5b32119f4058d7e8fbc6ff511,target=/cByOZc \
    --mount=type=bind,source=/tmp/0n3iy0j2,target=/tmp \
    --workdir=/cByOZc \
    --read-only=true \
    --user=1002:1002 \
    --rm \
    --cidfile=/tmp/z5bv8n5s/20211207145621-197335.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/cByOZc \
    arv-cli:build-tar-fd145ede211e86f23f7aeab39e45de43 \
    arv-get-cwl
INFO [job arv_get] Max memory used: 47MiB
INFO [job arv_get] completed success
INFO [step arv_get] completed success
INFO [workflow ] completed success
{
    "collection_file": [
        {
            "class": "File",
            "basename": "ubuntu.sif",
            "location": "file:///home/tom/gitlab/cwl-workflows/workflows/output/ubuntu.sif",
            "checksum": "sha1$8a13313f5de5ace0d943ff7a3257fc83c0538829",
            "size": 27742208,
            "path": "/home/tom/gitlab/cwl-workflows/workflows/output/ubuntu.sif"
        }
    ]
}
INFO Final process status is success

Workflow Code

CommandLineTool arv-get.cwl:

cwlVersion: v1.2
class: CommandLineTool

requirements:
  DockerRequirement:
    dockerPull: arv-cli
  NetworkAccess:
    networkAccess: true
  InitialWorkDirRequirement:
    listing:
      - entryname: cwl.inputs.json
        entry: '{"inputs": $(inputs)"}'

baseCommand:
  - arv-get-cwl

inputs:
  arvados_collection_locator: string
  arvados_api_token: string
  arvados_api_host: string

outputs:
  collection_file: File

The arv-get-cwl script within the container extracts the input from cwl.inputs.json and passes it to the arv-get command, after which the cwl.output.json file is produced with the filename:

cat > ${outputfile} <<EOL
{
  "collection_file": {
    "path": "${download_destination}",
    "class": "File"
  }
}
EOL

Workflow spike.cwl:

cwlVersion: v1.2
class: Workflow

$namespaces:
  cwltool: "http://commonwl.org/cwltool#"

hints:
  "cwltool:Secrets":
    secrets: [arvados_api_token]

requirements:
  InlineJavascriptRequirement: {}
  ScatterFeatureRequirement: {}
  StepInputExpressionRequirement: {}
  MultipleInputFeatureRequirement: {}

inputs:
  arvados_input_collection_locators: string[]
  arvados_output_collection_name: string
  arvados_api_host: string
  arvados_api_token: string

outputs:
  collection_file:
    type: File[]
    outputSource: arv_get/collection_file

steps:
  arv_get:
    run: arv-get.cwl
    scatter: arvados_collection_locator
    in:
      arvados_api_token: arvados_api_token
      arvados_api_host: arvados_api_host
      arvados_collection_locator: arvados_input_collection_locators
    out:
      - collection_file

Your Environment

  • cwltool version: 3.1.20211107152837
    Check using cwltool --version

CC @jrandall

@tetron
Copy link
Member

tetron commented Dec 8, 2021

As a workaround, it might work if you use relative paths in the cwl.output.json .

For the general case, cwltool would probably need to apply reverse path mapping to cwl.output.json to get the paths outside the container.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants