Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete result from minimap2-aligned bam files #395

Closed
zhangkn3 opened this issue Oct 13, 2023 · 19 comments
Closed

Incomplete result from minimap2-aligned bam files #395

zhangkn3 opened this issue Oct 13, 2023 · 19 comments

Comments

@zhangkn3
Copy link

Hi,
I have been using bambu lately for novel isoform identification from long-read RNA-seq data these days. And it is efficient and useful.

However, an issue happened when performing novel isoform identification from minimap2 aligned bam files. The metadata only included 4 columns, and the novelTranscript information is lacking. Hence, the novel isoforms can not be identified.

Pic

But running bambu from STAR-aligned bam files could be well performed.
Pic2

I hope I have described this issue clearly. Any hint on how to solve the issue is very appreciated!

Best,
Kenan

@zhangkn3
Copy link
Author

Additionally, the UCSC reference genome (hg38.fa) and annotation file (hg38.knownGene.gtf) were used in the code.
Thanks!

@andredsim
Copy link
Collaborator

Hi Kenan,

Thanks for reaching out. I am not yet too sure what could be causing this, as the aligner theoretically should not impact the output format of Bambu. Could you please share the lines of code you used to generate these two screenshots so I can get a better picture of what might be happening. You can rename all sample files if you need to maintain privacy.

@zhangkn3
Copy link
Author

Hi Andre,
Thanks for your prompt response. Here are my codes used for the bambu project. The only difference is the input bam files I gave to these codes.
Also, thanks for your kind help.

#bambu project
library(MatrixGenerics)
library(sparseMatrixStats)
library(DelayedMatrixStats)
library(Rsamtools)
library(bambu)
library(BSgenome)
gencode_seq <- "/project/lrRNA/bambu/hg38.fa"
gencode_gtf <- prepareAnnotations("/project/lrRNA/bambu/hg38.knownGene.gtf")
bam_file <- "/project/lrRNA/STARlong/GBM2010Aligned.sortedByCoord.out.bam"
ont_iso_analysis <- bambu(reads = bam_file, annotations = gencode_gtf, genome = gencode_seq, quant = FALSE, ncore = 8)
writeToGTF(ont_iso_analysis, "/project/lrRNA/bambu/output10M.gtf")
ont_iso_analysis.novel <- ont_iso_analysis[mcols(ont_iso_analysis)$novelTranscript,]
writeToGTF(ont_iso_analysis.novel, "/project/lrRNA/bambu/output.novel_isoform10M.gtf")

@andredsim
Copy link
Collaborator

Hi Kenen,

Thanks for this, I noticed in the first screen shot you use mcols(newAnnotations) which isn't a variable you included in the code you posted, whereas the second is of ont_iso_analysis. Could I ask where newAnnotations comes from.
If you save the output to GTF and then reread that GTF in with prepareAnnotations it unfortunately it does not keep all the metadata as we only save the standard GTF fields. (In an upcoming update we will be saving the NDR values in the gtf file)
If you would like to save the annotations with the metadata included so you can use it for future analysis I recommend using
saveRDS(ont_iso_analysis, "/save/path")
Let me know if this might be the cause, otherwise I can keep trying to figure out what might have caused this.

Kind Regards,
Andre Sim

@zhangkn3
Copy link
Author

Thanks Andre!
The newAnnotations is also the same as not_iso_analysis, I did not save the GTF and reread it.
But I will try the code you recommended, and get back to you.

Best,
Kenan

@andredsim
Copy link
Collaborator

Hi Kenen,

I tried some more tests on my side and havn't figured out how this could occur. Perhaps you could sample your two bam files down to an easily transferable size, check the issue still occurs, and if so and it is not sensitive data, upload them here and I can try run them to see if I can replicate the issue. This will make it a lot easier for me to troubleshoot.

Kind Regards,
Andre

@zhangkn3
Copy link
Author

Hi Andre,
I have rerun the codes and here is the result from the minimap2 aligned bam. I have added the warnings in the screenshot. Hope this could help you find the trouble.
Thx, Kenan
image

@andredsim
Copy link
Collaborator

Oh I think I see the issue
The warning "The current filtering criteria filters out all new read classes, please consider less stringent critera"
What this means is no new transcript models are predicted so it returns the original annotations. There are a few reasons this could happen.

  1. The default NDR threshold is too low and Bambu is being too stringent for your sample. You could try run it with NDR = 1 which is maximum sensitivity.
  2. The two bam files were not aligned to the same genome and for the bam file that is not working, the chromosome names do not match the chromosome names in the genome you have provided. This is a very common issue and often occurs from differences between "chr1" and "1" as scaffold names.
  3. It could be that no read classes were constructed, potentially as a consequence of 2. or other reasons. If the above to steps do not solve the issue can you send me the output from the metadata(output)$warnings
  4. Is this a standard sequencing run or is special prep involved? Certain protocols such as targeted sequencing need to be handled different or this error will be encountered.

Let me know how it goes.

@zhangkn3
Copy link
Author

Thanks Andre.
I have checked your suggestions.

  1. In the codes I used, "quant = FALSE" should be echos to "NDR = 1" according to the bambu manual. I also tried the new code with "NDR =1", and the result was still the same.
  2. The bam files aligned by minimap2 and STAR_long used the same reference genome. I also checked the scaffold names of the bam files and the reference genome are both "chr1".
  3. Here are the warnings. Please check.
    image
  4. This is a standard long-read RNA sequencing data produced by NanoPore from tumor tissue.

Great thanks for your kind help!
Kenan

@andredsim
Copy link
Collaborator

Hi Kenan,

Thanks for checking that. The warning number 3 is the important one here. Do you expect no spliced reads in your dataset? Given this is a standard tumor tissue in human I assume that is unlikely. Common causes for this is if the alignment was done without splice aware mode. Minimap2 is recommended to be run like this ./minimap2 -ax splice -uf -k14 ref.fa reads.fa > aln.sam for nanopore reads. Alternatively alignments need to be done to the genome and not the transcriptome, which would also cause unspliced reads, however given you said the scaffold names are "chr1" this is less likely.

Regarding point 1. NDR = 1 and quant = FALSE are very different. Setting quant to false stops Bambu from quantifying the output and will just return the filtered annotations (Filtered by NDR value). NDR = 1 runs the discovery step at its most sensitive meaning it will return all annotations regardless of their NDR value. If it doesn't trouble you, could you let me know what part of the manual this was, so that I can adjust it so it is clearer for future users!

Let me know if the alignment was the issue or if you do only expect un-spliced reads.
Thanks,
Andre

@zhangkn3
Copy link
Author

Thanks, Andre.
I think the issue must be happened with the minimap2 alignment. I will revise the code and rerun bambu, and come back to you later.
Thanks a lot for your kind help!
Kenan

@DepledgeLab
Copy link

Hello, I've been testing bambu on a dataset i am interested in and also get the warning ""The current filtering criteria filters out all new read classes, please consider less stringent criteria"

I'm running it as follows

se <- bambu(reads = "../rawdata/input.genome.sorted.bam", annotations = NULL, genome = "../rawdata/ref.fasta", NDR = 1, quant = FALSE) writeToGTF(se, "bambu.gtf")

and get the following warnings/errors

Detected 4 warnings across the samples during read class construction. Access warnings with metadata(bambuOutput)$warnings
--- Start extending annotations ---
The current filtering criteria filters out all new read 
            classes, please consider less strigent critria!


Error in `auto_copy()`:
! `x` and `y` must share the same src.
i `x` is a <tbl_df/tbl/data.frame> object.
i `y` is `NULL`.
i Set `copy = TRUE` if `y` can be copied to the same source as `x` (may be slow).
Run `rlang::last_trace()` to see where the error occurred.
Warning message:
Unknown or uninitialised column: `exon_rank`. 
> se.novel = se[mcols(se)$fullLengthCounts >= 0.1,]
> writeToGTF(se, "bambu.gtf")
Error in `auto_copy()`:
! `x` and `y` must share the same src.
i `x` is a <tbl_df/tbl/data.frame> object.
i `y` is `NULL`.
i Set `copy = TRUE` if `y` can be copied to the same source as `x` (may be slow).
Run `rlang::last_trace()` to see where the error occurred.
Warning message:
Unknown or uninitialised column: `exon_rank`. 

The system I'm working in has predominantly single exon transcripts and so I am wondering if I need to configure bambu differently to deal with this?

@andredsim
Copy link
Collaborator

Hi,

Yes by default Bambu excludes all single exon transcript. Please see this section of our documentation https://github.com/GoekeLab/bambu#Including-single-exons. You need to add this parameter to prevent them being filtered out.
opt.discovery = list(min.txScore.singleExon = 0))
Bambu is however designed for spliced transcripts and was only evaluated using them so I cannot vouch for the performance when there is predominately single exon transcripts as this is something we were hoping to look into in the future. I would be interested to hear how it performs.

Kind Regards,
Andre Sim

@DepledgeLab
Copy link

That is super helpful. thank you very much!

@andredsim
Copy link
Collaborator

Hi Zhang, Since we have not heard back I hope this solved you issue and will close it for now. If it hasn't let us know please write back with a detailed description of what is occurring and preferentially a reproducible example and we can reopen it.

@zhangkn3
Copy link
Author

Hi Andre,
Thanks for your response! Our HPC broke out last week, and all the jobs were killed. I am now rerunning the minimap2 alignment pipeline.
I will get back to you after I get the results.
Thanks again.

@andredsim
Copy link
Collaborator

andredsim commented Oct 30, 2023

Ok sorry to hear about the killed jobs, if you are still having issues once it finishes I can reopen this.

@zhangkn3
Copy link
Author

zhangkn3 commented Nov 2, 2023

Hi Andre,
Thanks for your kind help. The issue is because of the minimap2 alignment codes. Bambu can run well with the minimap2 -ax splice results.
Additionally, I rechecked the manual and found the misunderstanding of NDR = 1 and quant = FALSE should be my fault.

Great thanks for your assistance, and thanks again for the brilliant software.
Kenan

@andredsim
Copy link
Collaborator

Glad to hear everything is resolved! I hope you get some nice results!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants