-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can you recommend some reference databases (Positive mode) for ITS and 18S sequences? #205
Comments
Hi,
I do not have experience with ITS/18S databases.
In general, the final filtering step (for which you are looking for a
database) is not mandatory. I would recommend deblurring without this
filtering, and then looking at the results and seeing if/what non ITS/18S
sequences are dominant and what their identity/origin is. Then you can
consider if you want to apply the filtering and asses it's performance.
Amnon
…On Fri, Nov 27, 2020 at 6:44 AM Li shuzhen ***@***.***> wrote:
Hi,
Default database in Deblur for 16S sequences is Greengene. Can you
recommend some reference databases (Positive mode) for ITS and 18S
sequences?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#205>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABMQB4VBL664O22UW2KONPDSR4VDPANCNFSM4UEOFBFQ>
.
|
Thank you for your prompt reply! |
A simple way would be to run the deblur workflow without supplying the
positive filtering database. This will use the greengenes database (which
is not relevant to your data). But then from the deblur output, you can use
the "all.biom" table instead of the "reference-hit.biom".
The "all.biom" table will contain all deblurred sequences (without the
positive filtering step).
Good luck
Amnon
…On Sun, Nov 29, 2020 at 1:18 PM Li shuzhen ***@***.***> wrote:
Thank you for your prompt reply!
A positive filtering database is required according to the Deblur help
documentation. How can I skip this?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#205 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABMQB4WFZQ7HNJRDOBWB6YTSSIUZFANCNFSM4UEOFBFQ>
.
|
Thank you so much! |
Hi,
deblur works on each sample independently (so it doesn't do any pooling)
and then combines the results to a single table. The only (optional) deblur
step that takes into account ASV distribution across multiple samples is
the final --min-reads flag (which by default is 10) that removes all ASVs
with <=min-reads read total over all samples combined (you can disable this
step by providing --min-reads 0).
Regarding looking at singletons, this can be problematic. Deblur (when
working on each sample) first throws away all singletons and ten processes
all remaining reads. This is because singletons introduce a large rate of
discreteness to the denoising process (which uses the smooth noise
distribution) and therefore hard to clean (and contain a very large amount
of sequencing artifacts).
Therefore, deblur introduces a non-linearity in the singletons, which may
be problematic for downstream analysis that specifically uses the amount of
singletons.
As an alternative, maybe try deblurring the data and then rarifying to 1/2
number of resulting reads/sample (or even 1/3 or lower if you have enough
reads?). This way i think can produce singletons after the denoising (and
these singletons should be identical to what you would get if you would
sample your population (without read errors) to a lower depth). So i think
this may be a relatively valid way to overcome the problem of singletons?
Does this help?
Amnon
…On Sun, Nov 29, 2020 at 2:03 PM Li shuzhen ***@***.***> wrote:
Thank you so much!
By the way, may I ask you another question? Deblur performs quality
control on each sample respectively. After this, is it that all the samples
are mixed together to get the ASV table and representative sequence? Or do
you get ASV separately for each sample and then merge these tables?
Recently I have found that pooling samples or not will have a great impact
on ASV numbers. Here is the link: benjjneb/dada2#1194
<benjjneb/dada2#1194>. Do you have any thoughts
on this?
Thanks again for your kind help.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#205 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABMQB4VLK7R3OGNEXR4JWTDSSI2BJANCNFSM4UEOFBFQ>
.
|
Thank you for your answer! Compared to singleton, I think unique (i.e. species that occur in only one sample) is more likely to be a problem among different ASV algorithms. I found that some high numbers of ASVs only appear in one sample in DADA2, whilst all samples were parallel replicates. Deblur performed well in my test. So does Deblur do anything specific with this situation? |
Deblur treats each sample independently. After deblurring each sample, all
deblurred samples are joined to a single biom table, and sequences with <
10 reads total (over all samples) are removed (this is controlled by the
--min-reads parameter, which can be set to 0 to disable this step).
When deblurring each sample, deblur first throws away all singleton reads
from the sample, and then proceeds to the rest of the denoising steps on
the sample (removing phiX sequences, denoising, removing chimeras)
does this make sense?
…On Thu, Dec 3, 2020 at 5:07 AM Li shuzhen ***@***.***> wrote:
Thank you for your answer! Compared to singleton, I think unique (i.e.
species that occur in only one sample) is more likely to be a problem among
different ASV algorithms. I found that some high numbers of ASVs only
appear in one sample in DADA2, whilst all samples were parallel replicates.
Deblur performed well in my test. So does Deblur do anything specific with
this situation?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#205 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABMQB4QVDSPL4XDKWKENHLTSS36HHANCNFSM4UEOFBFQ>
.
|
Thank you very much for your reply! |
Hi,
Default database in Deblur for 16S sequences is Greengene. Can you recommend some reference databases (Positive mode) for ITS and 18S sequences?
The text was updated successfully, but these errors were encountered: