You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we originally created csv_summary output format, we didn't have a way to properly check that summarization happened independently on separate queries, so we limited this output to single-query runs.
That issue has long since been fixed, and csv_summary can smoothly output multiple query results, except that we explicitly prohibited it from doing so.
Currently, when we generate a `csv_summary` with LINs, we get a summary
at every single LIN rank, which is a lot of results and not very
helpful. LINgroups are our way of linking the LINs (e.g.
`14;1;0;0;0;0;0;0;0;0`) to a known name/taxonomic group (e.g. "Phylotype
I").
This PR changes the behavior of `csv_summary` when a `lingroup` file is
provided, limiting summarized reporting to just the named lingroups.
While the output is very similar to the `lingroup` output we already
have, the most important difference is that the sample name is included
in the output, meaning that we get intelligible results when running
`tax metagenome` on more than one sample.
Prior `tax metagenome` behavior was to always generate a `lingroup`
output file when a `lingroups` file is provided. Here, I disable that
for multiple queries, since the results wouldn't make sense. I do not
replace it with another default, but I did add a recommendation to the
help + doc.
In the future, we could consider changing the default `lingroup` output
to `csv_summary`, since it's actually useful for multiple files. Or, we
could modify the `lingroup` output to include query information.
- Also fixes#3315
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
When we originally created
csv_summary
output format, we didn't have a way to properly check that summarization happened independently on separate queries, so we limited this output to single-query runs.That issue has long since been fixed, and
csv_summary
can smoothly output multiple query results, except that we explicitly prohibited it from doing so.Fix in progress in #3311
The text was updated successfully, but these errors were encountered: