Failed to evaluate job outputs - IOException: Could not read from s3... #4687

doron-st · 2019-02-28T13:15:57Z

While testing cromwell-36 with AWS batch I was able to reproduce this error:

2019-02-25 09:38:52,508 cromwell-system-akka.dispatchers.engine-dispatcher-24 ERROR - WorkflowManagerActor Workflow b6b9322c-3929-4b72-9598-45d97dfb858d failed (during ExecutingWorkflowState): cromwell.backend.standard.StandardAsyncExecutionActor$$anon$2: Failed to evaluate job outputs:
Bad output 'print_nach_nachman_meuman.out': [Attempted 1 time(s)] - IOException: Could not read from s3://nrglab-cromwell-genomics/cromwell-execution/run_multiple_tests/b6b9322c-3929-4b72-9598-45d97dfb858d/call-test_cromwell_on_aws/shard-61/SingleTest.test_cromwell_on_aws/f8ecf673-ed61-4b06-b1d6-c20f7efe986e/call-print_nach_nachman_meuman/print_nach_nachman_meuman-stdout.log: Cannot access file: s3://s3.amazonaws.com/nrglab-cromwell-genomics/cromwell-execution/run_multiple_tests/b6b9322c-3929-4b72-9598-45d97dfb858d/call-test_cromwell_on_aws/shard-61/SingleTest.test_cromwell_on_aws/f8ecf673-ed61-4b06-b1d6-c20f7efe986e/call-print_nach_nachman_meuman/print_nach_nachman_meuman-stdout.log
        at cromwell.backend.standard.StandardAsyncExecutionActor.$anonfun$handleExecutionSuccess$1(StandardAsyncExecutionActor.scala:867)

The error occurs when running many sub-workflows within a single wrapping workflow.
The environment is configured correctly, and the test usually passes when running <30 subworkflows.

Here are the workflows:

run_multiple_test.wdl

import "three_task_sequence.wdl" as SingleTest

workflow run_multiple_tests {
    scatter (i in range(30)){
        call SingleTest.three_task_sequence{}
    }
}

three_task_sequence.wdl

workflow three_task_sequence{
    call print_nach

    call print_nach_nachman {
        input:
            previous = print_nach.out
    }

    call print_nach_nachman_meuman{
        input:
                previous = print_nach_nachman.out
    }
    output{
        Array[String] out = print_nach_nachman_meuman.out
    }
}

task print_nach{
     command{
         echo "nach"
     }
     output{
         Array[String] out = read_lines(stdout())
     }
     runtime {
	    docker: "ubuntu:latest"
	    maxRetries: 3
     }
 }

 task print_nach_nachman{
    Array[String] previous

     command{
         echo ${sep=' ' previous} " nachman"
     }
     output{
         Array[String] out = read_lines(stdout())
     }
     runtime {
        docker: "ubuntu:latest"
        maxRetries: 3
     }
     
 }

 task print_nach_nachman_meuman{
     Array[String] previous

      command{
        echo ${sep=' ' previous} " meuman"
      }
      output{
        Array[String] out = read_lines(stdout())
      }
      runtime {
        docker: "ubuntu:latest"
        maxRetries: 3
      }
  }

Here is the cromwell-conf:

// aws.conf
include required(classpath("application"))

webservice {
  port = 8001
  interface = 0.0.0.0
}

aws {
  application-name = "cromwell"
  auths = [{
      name = "default"
      scheme = "default"
  }]
  region = "us-east-1"
}

engine {
  filesystems {
    s3 { auth = "default" }
  }
}

backend {
  default = "AWSBATCH"
  providers {
    AWSBATCH {
      actor-factory = "cromwell.backend.impl.aws.AwsBatchBackendLifecycleActorFactory"
      config {
        root = "s3://nrglab-cromwell-genomics/cromwell-execution"
        auth = "default"

        numSubmitAttempts = 3
        numCreateDefinitionAttempts = 3

        concurrent-job-limit = 100

        default-runtime-attributes {
          queueArn: "arn:aws:batch:us-east-1:66:job-queue/GenomicsDefaultQueue"
        }

        filesystems {
          s3 {
            auth = "default"
          }
        }
      }
    }
  }
}

system {
  job-rate-control {
    jobs = 1
    per = 1 second
  }
}

Would appreciate help on this.
I wonder if cromwell was ever tested for many parallel sub-workflows running on AWS?

Thanks!

The text was updated successfully, but these errors were encountered:

caaespin · 2019-07-25T07:38:53Z

Hey, did you ever manage to get a workaround for this error?

geoffjentry · 2019-07-25T07:44:51Z

@caaespin I'm assuming that means you still see this. Are you using a recent Cromwell version? (42+)

caaespin · 2019-07-25T07:50:40Z

@geoffjentry yes. My current deployment is v42.

If you have access to the GATK forums, i put more details in my post there: https://gatkforums.broadinstitute.org/wdl/discussion/24268/aws-batch-randomly-fails-when-running-multiple-workflows/p1?new=1

marpiech · 2019-08-01T08:20:35Z

One up. I have similar error

caaespin · 2019-08-18T18:20:15Z

@geoffjentry from inspecting logs and AWS Batch console, i think what is happening is that the jobs fail because Cromwell shutdowns the VMs earlier than expected. So one of shard hasn't finished and is unable to upload to S3, hence the problem here occurs. Anyways this is a hypothesis based on what I saw, hopefully is helpful.

alexwaldrop · 2020-02-04T21:35:16Z

@geoffjentry Any movement on this? I'm having this same issue sporadically (v48 + AWS backend) with workflows that contain large scatter operations.

geoffjentry · 2020-02-04T22:00:40Z

@alexwaldrop NB that I don't work there anymore and sadly haven't had the energy to actively contribute. Perhaps @aednichols can chime in

blindmouse · 2020-03-11T20:42:04Z

I am having the same error with the example "Using Data on S3" on https://docs.opendata.aws/genomics-workflows/orchestration/cromwell/cromwell-examples/ . I have changed the S3 bucket name in the .json file to my bucket name, but the run still failed. After reporting running failure, I have got the same error message. I am using cromwell-48. The S3 bucket has all public access, and I was logged in as the Admin in two terminal windows, one running the server and the other submitting the job. The previous two hello-world example were successful. There is no log file in the bucket and in the cromwell-execution, the only file create was the script. There is no rc or stderr or stdout created.

sripaladugu · 2020-07-21T20:08:57Z

I am having the same error with the example "Using Data on S3" on https://docs.opendata.aws/genomics-workflows/orchestration/cromwell/cromwell-examples/ . I have changed the S3 bucket name in the .json file to my bucket name, but the run still failed. After reporting running failure, I have got the same error message. I am using cromwell-48. The S3 bucket has all public access, and I was logged in as the Admin in two terminal windows, one running the server and the other submitting the job. The previous two hello-world example were successful. There is no log file in the bucket and in the cromwell-execution, the only file create was the script. There is no rc or stderr or stdout created.

@blindmouse Were you able to resolve your issue? I am encountering the same problem. Thanks.

markjschreiber · 2020-07-21T23:14:11Z

This can happen if the job fails meaning that an rc.txt file isn’t created. It would be worth looking at the CloudWatch log for the batch job.

…

On Tue, Jul 21, 2020 at 4:07 PM Sri Paladugu ***@***.***> wrote: Is there any progress on this issue? I am the getting the following exception: IOException: Could not read from s3:///results/ReadFile/5fec5c4a-2e3f-49ed-8f9e-6d9d2d759449/call-read_file/read_file-rc.txt Caused by: java.nio.file.NoSuchFileException: s3:// s3.amazonaws.com/s3bucketname/results/ReadFile/5fec5c4a-2e3f-49ed-8f9e-6d9d2d759449/call-read_file/read_file-rc.txt — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4687 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF2E6EMJZ66Z5PIAEUX3IBLR4XYPZANCNFSM4G23FFUQ> .

sripaladugu · 2020-07-22T00:18:20Z

This can happen if the job fails meaning that an rc.txt file isn’t created. It would be worth looking at the CloudWatch log for the batch job.
…
On Tue, Jul 21, 2020 at 4:07 PM Sri Paladugu @.***> wrote: Is there any progress on this issue? I am the getting the following exception: IOException: Could not read from s3:///results/ReadFile/5fec5c4a-2e3f-49ed-8f9e-6d9d2d759449/call-read_file/read_file-rc.txt Caused by: java.nio.file.NoSuchFileException: s3:// s3.amazonaws.com/s3bucketname/results/ReadFile/5fec5c4a-2e3f-49ed-8f9e-6d9d2d759449/call-read_file/read_file-rc.txt — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4687 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF2E6EMJZ66Z5PIAEUX3IBLR4XYPZANCNFSM4G23FFUQ .

Cloudwatch logs contained the following message: "/bin/bash: /var/scratch/fetch_and_run.sh: Is a directory"

markjschreiber · 2020-08-08T22:01:02Z

It may be that you’re running Cromwell 52 or later with an older AWS CloudFormation built infrastructure. Can you share which build of Cromwell you’re using and the build/ version/ origin of the CloudFormation template? On Tue, Jul 21, 2020 at 8:18 PM Sri Paladugu <[email protected]> wrote:

…

This can happen if the job fails meaning that an rc.txt file isn’t created. It would be worth looking at the CloudWatch log for the batch job. … <#m_-7712250081708699723_> On Tue, Jul 21, 2020 at 4:07 PM Sri Paladugu *@*.***> wrote: Is there any progress on this issue? I am the getting the following exception: IOException: Could not read from s3:///results/ReadFile/5fec5c4a-2e3f-49ed-8f9e-6d9d2d759449/call-read_file/read_file-rc.txt Caused by: java.nio.file.NoSuchFileException: s3:// s3.amazonaws.com/s3bucketname/results/ReadFile/5fec5c4a-2e3f-49ed-8f9e-6d9d2d759449/call-read_file/read_file-rc.txt — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4687 (comment) <#4687 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF2E6EMJZ66Z5PIAEUX3IBLR4XYPZANCNFSM4G23FFUQ . Cloudwatch logs contained the following message: "/bin/bash: /var/scratch/fetch_and_run.sh: Is a directory" — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4687 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF2E6ENOHHXQP6VC5XUGZ5TR4YV5XANCNFSM4G23FFUQ> .

mderan-da · 2020-09-09T13:20:09Z

Hi @markjschreiber I'm also running into this error. I am using cromwell 53 with a custom cdk stack based on the CloudFormation infrastructure described here: https://docs.opendata.aws/genomics-workflows/

Are modifications needed for compatibility with newer versions of Cromwell? Are these documented somewhere?

markjschreiber · 2020-09-11T21:46:37Z

Attached is some documentation that works for v52 and should work for v53

…

On Wed, Sep 9, 2020 at 9:20 AM mderan-da ***@***.***> wrote: Hi @markjschreiber <https://github.com/markjschreiber> I'm also running into this error. I am using cromwell 53 with a custom cdk stack based on the CloudFormation infrastructure described here: https://docs.opendata.aws/genomics-workflows/ Are modifications needed for compatibility with newer versions of Cromwell? Are these documented somewhere? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4687 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF2E6EO6WEE4BYYPTX4HZ2LSE56JXANCNFSM4G23FFUQ> .

mderan-da · 2020-09-12T19:16:19Z

Hi @markjschreiber Thanks but it looks like the attachment didn't come through.

yaomin · 2020-09-13T20:47:51Z

@markjschreiber running into the same error for both v52 and v53.1. I am using the same CloudFormation @mderan-da mentioned . Appreciate your newer documentation on this.

markjschreiber · 2020-09-14T14:19:20Z

Documentation can be downloaded from here https://cromwell-share-ad485.s3.us-east-2.amazonaws.com/InstallingGenomicsWorkflowCoreWithCromwel52.pdf

…

On Sun, Sep 13, 2020 at 4:48 PM Yaomin Xu ***@***.***> wrote: @markjschreiber <https://github.com/markjschreiber> running into the same error for both v52 and v53.1. I am using the same CloudFormation @mderan-da <https://github.com/mderan-da> mentioned . Appreciate your newer documentation on this. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4687 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF2E6EKCM56WST3J6NO5CS3SFUVYLANCNFSM4G23FFUQ> .

dfeinzeig · 2020-11-14T18:21:37Z

Cloudwatch logs contained the following message: "/bin/bash: /var/scratch/fetch_and_run.sh: Is a directory"

Also have this error. Anyone figure out what the issue is?

geertvandeweyer · 2020-12-17T17:35:02Z

Also have this error, using Cromwell 52, installed using this manual :

https://aws-genomics-workflows.s3.amazonaws.com/Installing+the+Genomics+Workflow+Core+and+Cromwell.pdf

logs say : fetch_and_run.is is a directory.

geertvandeweyer · 2020-12-17T18:02:46Z

Also have this error, using Cromwell 52, installed using this manual :

https://aws-genomics-workflows.s3.amazonaws.com/Installing+the+Genomics+Workflow+Core+and+Cromwell.pdf

logs say : fetch_and_run.is is a directory.

Extra info : cloning job & resubmitting through aws console runs fine. so it seems to be a temporary issue

sscho · 2021-05-13T22:35:46Z

Hmmm, still stuck on this - any updates from your guys' end? I tried cloning and resubmitting, still getting the same error.

ptdtan · 2021-06-08T11:19:25Z

Still getting this error today.

alimayy · 2022-09-12T22:09:21Z

I'm getting this error almost certainly when I run workflows where more samples (e.g. 96) than usual are scattered.
Cromwell version: 60-6048d0e-SNAP.

Is there a workaround to this?

rnaidu · 2025-01-29T20:14:26Z

Hi all, are there any updates to a workaround for this error? I'm getting the same error using Cromwell v87

gemmalam added the Needs Triage Ticket needs further investigation and refinement prior to moving to milestones label Mar 4, 2019

alexiswl mentioned this issue Sep 29, 2021

Getting error /var/scratch/fetch_and_run.sh: Is a directory soon after launch of workflow aws/amazon-genomics-cli#40

Closed

alexiswl mentioned this issue Nov 17, 2021

Remove and reinstall latest versions of amazon ssm agent for alinux2 aws-samples/aws-genomics-workflows#183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to evaluate job outputs - IOException: Could not read from s3... #4687

Failed to evaluate job outputs - IOException: Could not read from s3... #4687

doron-st commented Feb 28, 2019

caaespin commented Jul 25, 2019

geoffjentry commented Jul 25, 2019

caaespin commented Jul 25, 2019 •

edited

Loading

marpiech commented Aug 1, 2019

caaespin commented Aug 18, 2019

alexwaldrop commented Feb 4, 2020

geoffjentry commented Feb 4, 2020

blindmouse commented Mar 11, 2020

sripaladugu commented Jul 21, 2020 •

edited

Loading

markjschreiber commented Jul 21, 2020 via email

sripaladugu commented Jul 22, 2020

markjschreiber commented Aug 8, 2020 via email

mderan-da commented Sep 9, 2020

markjschreiber commented Sep 11, 2020 via email

mderan-da commented Sep 12, 2020

yaomin commented Sep 13, 2020

markjschreiber commented Sep 14, 2020 via email

dfeinzeig commented Nov 14, 2020

geertvandeweyer commented Dec 17, 2020

geertvandeweyer commented Dec 17, 2020

sscho commented May 13, 2021

ptdtan commented Jun 8, 2021

alimayy commented Sep 12, 2022

rnaidu commented Jan 29, 2025

Failed to evaluate job outputs - IOException: Could not read from s3... #4687

Failed to evaluate job outputs - IOException: Could not read from s3... #4687

Comments

doron-st commented Feb 28, 2019

caaespin commented Jul 25, 2019

geoffjentry commented Jul 25, 2019

caaespin commented Jul 25, 2019 • edited Loading

marpiech commented Aug 1, 2019

caaespin commented Aug 18, 2019

alexwaldrop commented Feb 4, 2020

geoffjentry commented Feb 4, 2020

blindmouse commented Mar 11, 2020

sripaladugu commented Jul 21, 2020 • edited Loading

markjschreiber commented Jul 21, 2020 via email

sripaladugu commented Jul 22, 2020

markjschreiber commented Aug 8, 2020 via email

mderan-da commented Sep 9, 2020

markjschreiber commented Sep 11, 2020 via email

mderan-da commented Sep 12, 2020

yaomin commented Sep 13, 2020

markjschreiber commented Sep 14, 2020 via email

dfeinzeig commented Nov 14, 2020

geertvandeweyer commented Dec 17, 2020

geertvandeweyer commented Dec 17, 2020

sscho commented May 13, 2021

ptdtan commented Jun 8, 2021

alimayy commented Sep 12, 2022

rnaidu commented Jan 29, 2025

caaespin commented Jul 25, 2019 •

edited

Loading

sripaladugu commented Jul 21, 2020 •

edited

Loading