Large mismatches between unique TCC counts and estimated counts for transcripts with very large abundances in lr-kallisto #482

dnwissel · 2025-03-09T09:54:56Z

Hi,

first, thanks a lot for developing kallisto and lr-kallisto!

We've been running a benchmark on quantifying some PacBio spike-in (SIRV) data recently with different quantification methods and noticed an odd mismatch for lr-kallisto where the estimated abundance for some spike-ins was much smaller than expected. When looking at this in more detail, we noticed that for these spike-ins, the unique TCC corresponding to them seems to be potentially ignored (based on the fact that the final abundance is significantly smaller than the TCC count that is unique to that isoform). Here is an example for two spike-in isoforms from SIRV4 and SIRV5:

transcript_id	estimated_counts	tcc_count	delta_est_tcc
SIRV410	33.00	1937240	-1937207
SIRV508	1573.18	2075800	-2074227

In addition, we noticed that this issue wasn't present in some downsampling experiments that we ran and seems to only happen for isoforms which have very large unique TCC values (~ > 1e6). No other isoforms had a negative delta between estimated counts and unique TCC counts (which makes sense, since the unique TCC count should lower bound the estimated counts, I suppose).

Estimated counts were taken from matrix.abundance.mtx and unique EC counts from count.mtx (and filtered to only contain unique TCCs). We've accounted for the different offsets and as mentioned, when downsampling, these problems disappear completely (which is the reason we noticed it in the first place, since performance degrades drastically for downstream tasks such as DTE at full depth compared to downsamplings).

Could you think of a possible explanation for this (including user error on our side) or is this a bug?

This is on Kallisto 0.51.1 and bustools 0.44.1 on Ubuntu 22.04. Full commands below with enough Snakemake removed to make it easily parseable (hopefully).

Indexing:

  kallisto index \
      -k 63 -t 12 -i {output.index} \
      results/prepare/extract_transcriptomes/sirv_transcriptome.fa &> {log}

Quantification:

  kallisto  bus -t 12 --long --threshold \
      0.8 -x bulk -i {input.index} \
      -o {params.outdir} {input.reads} &>> {log};
  bustools sort -t 12 {params.outdir}/output.bus \
      -o {params.outdir}/sorted.bus &>> {log};
  bustools count {params.outdir}/sorted.bus \
      -t {params.outdir}/transcripts.txt \
      -e {params.outdir}/matrix.ec \
      -g {input.sirv_four_transcriptome_gmap} \
      -o {params.outdir}/count --cm -m \
      &>> {log};
  kallisto  quant-tcc -t 12 \
      --long -P PacBio -f {params.outdir}/flens.txt \
      {params.outdir}/count.mtx -i {input.index} \
      -e {params.outdir}/count.ec.txt \
      -o {params.outdir} &>> {log}

Happy to provide a full reprex or any other information/details, although I would have to share full BAM files, given that this only seems to happen at sufficient depth.

Thanks a lot!

Best
David

The text was updated successfully, but these errors were encountered:

bound-to-love · 2025-03-10T02:33:44Z

Thanks for submitting an issue for this! I’ll look into it; would you be able to share the output folder with me; my email is: [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large mismatches between unique TCC counts and estimated counts for transcripts with very large abundances in lr-kallisto #482

Large mismatches between unique TCC counts and estimated counts for transcripts with very large abundances in lr-kallisto #482

dnwissel commented Mar 9, 2025 •

edited

Loading

bound-to-love commented Mar 10, 2025

Large mismatches between unique TCC counts and estimated counts for transcripts with very large abundances in lr-kallisto #482

Large mismatches between unique TCC counts and estimated counts for transcripts with very large abundances in lr-kallisto #482

Comments

dnwissel commented Mar 9, 2025 • edited Loading

bound-to-love commented Mar 10, 2025

dnwissel commented Mar 9, 2025 •

edited

Loading