Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v.2.2.0 fails on multiple replicates of the same sample #159

Open
dmitrymyl opened this issue Aug 14, 2024 · 6 comments
Open

v.2.2.0 fails on multiple replicates of the same sample #159

dmitrymyl opened this issue Aug 14, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@dmitrymyl
Copy link

Description of the bug

Hey!

I was running the nascent pipeline v2.2.0 on two replicates of the same sample, and encountered the following error at the step of FASTQC:

Aug-14 10:33:55.296 [Actor Thread 74] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_NASCENT:NASCENT:FASTQC (1)'

Caused by:
  Process `NFCORE_NASCENT:NASCENT:FASTQC` input file name collision -- There are multiple input files for each of the following file names: other.fq.gz


Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

While v2.1.1 runs without this error. It might be related to #143.

Command used and terminal output

No response

Relevant files

No response

System information

No response

@dmitrymyl dmitrymyl added the bug Something isn't working label Aug 14, 2024
@edmundmiller
Copy link
Collaborator

Hey! Could you share some steps to reproduce this error? Maybe a minimal samplesheet?

Are you using something like:

sample,fastq_1,fastq_2
other,https://raw.githubusercontent.com/nf-core/test-datasets/nascent/testdata/SRX882903_T1.fastq.gz,
other,https://raw.githubusercontent.com/nf-core/test-datasets/nascent/testdata/SRX882903_T2.fastq.gz,

Or whatever name you're using?

@dmitrymyl
Copy link
Author

The command I run is similar to this:

nextflow -bg run nf-core/nascent -r 2.2.0 -profile cbe -work-dir $NXF_WRK -params-file params.json

My samplesheet looks like this:

sample,fastq_1,fastq_2
SAMPLE_REP1,path/to/reads.fq.gz,
SAMPLE_REP1,path/to_other/reads.fq.gz,

And my params.json looks like this:

{
    "input": ".\/samplesheet.csv",
    "outdir": ".\/outputdir",
    "assay_type": "GROseq",
    "fasta": "..\/data\/hg19.fa",
    "gtf": "..\/data\/gencode.v46lift37.basic.annotation.gtf",
    "bwa_index": "..\/data\/hg19.p13.plusMT.no_alt_analysis_set\/"
}

These input files work fine with pipeline v2.1.1

@dmitrymyl
Copy link
Author

Just tried running dev version, it also fails with the same error.

@edmundmiller
Copy link
Collaborator

The command I run is similar to this:

Is there anyway I could get the exact command, not just similar?

I only ask because the error message has other.fq.gz which doesn't match up with the sample IDs in your samplesheet or the name of the fastq file directly.

Only thing I could think of is if you're naming both of the files reads.fastq.gz which is something I've seen throw an error in other pipelines. To that, I'd suggest changing your file names to SAMPLE_REP1,path/to/sample.fq.gz and path/to_other/sample_other.fq.gz.

This causes issues because nf-validation is checking to make sure you didn't accidentally include the sample file twice. But I'd expect that to fail out sooner.

@dmitrymyl
Copy link
Author

Is there anyway I could get the exact command, not just similar?

I replaced unnecessary details of my local paths preserving the conceptual structure. I'm not sure if knowledge of a path to my work directory would help :)

Only thing I could think of is if you're naming both of the files reads.fastq.gz which is something I've seen throw an error in other pipelines. To that, I'd suggest changing your file names to SAMPLE_REP1,path/to/sample.fq.gz and path/to_other/sample_other.fq.gz.

My samplesheet is:

sample,fastq_1,fastq_2
ANDERSSON_REP1,sortmerna/SRR1596500/out/other.fq.gz,
ANDERSSON_REP1,sortmerna/SRR1596501/out/other.fq.gz,

Filepaths are different, the only shared thing is the final name itself, other.fq.gz. This is a default output of sortmerna: the reads unmapped to rRNAs are collected in other.fq.gz. Two replicates of the same sample are clearly in different directories.

To try out your suggestion, I created a symlink for the second file, so the samplesheet becomes:

sample,fastq_1,fastq_2
ANDERSSON_REP1,sortmerna/SRR1596500/out/other.fq.gz,
ANDERSSON_REP1,sortmerna/SRR1596501/out/other_other.fq.gz,

So both the directories and the names are different. And I ran the dev version. FASTQC successfully completes, but now it fails on SAMTOOLS_INDEX:

ERROR ~ Error executing process > 'NFCORE_NASCENT:NASCENT:FASTQ_ALIGN_BWA:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_INDEX (ANDERSSON_REP1)'

Caused by:
  Process `NFCORE_NASCENT:NASCENT:FASTQ_ALIGN_BWA:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_INDEX (ANDERSSON_REP1)` terminated with an error exit status (1)

Command executed:

  samtools \
      index \
      -@ 1 \
       \
      ANDERSSON_REP1.sorted.bam

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_NASCENT:NASCENT:FASTQ_ALIGN_BWA:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_INDEX":
      samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  [E::bgzf_read] Read block operation failed with error 4 after 0 of 4 bytes
  samtools index: failed to create index for "ANDERSSON_REP1.sorted.bam"

I tried to manually make an index (another node, another samtools version, another working folder), but it failled with the same error. At the same time, samtools head works fine.

So, regarding replicates of the same sample, the issue is indeed with the same file handle. This is an unexpected behaviour: files are in different folders, I created them in a standard way, it seems intuitive to have same file handles for the same type of outcome. And I don't think a user is expected to know about this problem. Anyway, I can approve #161, but still, the UX is not optimal.

Regarding samtools index problem I am rerunning the pipeline with version 2.2.0.

@dmitrymyl
Copy link
Author

Okay, samtools index works on 2.2.0 but not on dev.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants