You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
first, thanks a lot for developing kallisto and lr-kallisto!
We've been running a benchmark on quantifying some PacBio spike-in (SIRV) data recently with different quantification methods and noticed an odd mismatch for lr-kallisto where the estimated abundance for some spike-ins was much smaller than expected. When looking at this in more detail, we noticed that for these spike-ins, the unique TCC corresponding to them seems to be potentially ignored (based on the fact that the final abundance is significantly smaller than the TCC count that is unique to that isoform). Here is an example for two spike-in isoforms from SIRV4 and SIRV5:
transcript_id
estimated_counts
tcc_count
delta_est_tcc
SIRV410
33.00
1937240
-1937207
SIRV508
1573.18
2075800
-2074227
In addition, we noticed that this issue wasn't present in some downsampling experiments that we ran and seems to only happen for isoforms which have very large unique TCC values (~ > 1e6). No other isoforms had a negative delta between estimated counts and unique TCC counts (which makes sense, since the unique TCC count should lower bound the estimated counts, I suppose).
Estimated counts were taken from matrix.abundance.mtx and unique EC counts from count.mtx (and filtered to only contain unique TCCs). We've accounted for the different offsets and as mentioned, when downsampling, these problems disappear completely (which is the reason we noticed it in the first place, since performance degrades drastically for downstream tasks such as DTE at full depth compared to downsamplings).
Could you think of a possible explanation for this (including user error on our side) or is this a bug?
This is on Kallisto 0.51.1 and bustools 0.44.1 on Ubuntu 22.04. Full commands below with enough Snakemake removed to make it easily parseable (hopefully).
Happy to provide a full reprex or any other information/details, although I would have to share full BAM files, given that this only seems to happen at sufficient depth.
Thanks a lot!
Best
David
The text was updated successfully, but these errors were encountered:
Hi,
first, thanks a lot for developing kallisto and lr-kallisto!
We've been running a benchmark on quantifying some PacBio spike-in (SIRV) data recently with different quantification methods and noticed an odd mismatch for lr-kallisto where the estimated abundance for some spike-ins was much smaller than expected. When looking at this in more detail, we noticed that for these spike-ins, the unique TCC corresponding to them seems to be potentially ignored (based on the fact that the final abundance is significantly smaller than the TCC count that is unique to that isoform). Here is an example for two spike-in isoforms from SIRV4 and SIRV5:
In addition, we noticed that this issue wasn't present in some downsampling experiments that we ran and seems to only happen for isoforms which have very large unique TCC values (~ > 1e6). No other isoforms had a negative delta between estimated counts and unique TCC counts (which makes sense, since the unique TCC count should lower bound the estimated counts, I suppose).
Estimated counts were taken from
matrix.abundance.mtx
and unique EC counts fromcount.mtx
(and filtered to only contain unique TCCs). We've accounted for the different offsets and as mentioned, when downsampling, these problems disappear completely (which is the reason we noticed it in the first place, since performance degrades drastically for downstream tasks such as DTE at full depth compared to downsamplings).Could you think of a possible explanation for this (including user error on our side) or is this a bug?
This is on Kallisto 0.51.1 and bustools 0.44.1 on Ubuntu 22.04. Full commands below with enough Snakemake removed to make it easily parseable (hopefully).
Indexing:
Quantification:
Happy to provide a full reprex or any other information/details, although I would have to share full BAM files, given that this only seems to happen at sufficient depth.
Thanks a lot!
Best
David
The text was updated successfully, but these errors were encountered: