Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv_summary format not created for multiple queries (tax metagenome) #3315

Closed
bluegenes opened this issue Sep 4, 2024 · 0 comments · Fixed by #3311
Closed

csv_summary format not created for multiple queries (tax metagenome) #3315

bluegenes opened this issue Sep 4, 2024 · 0 comments · Fixed by #3311

Comments

@bluegenes
Copy link
Contributor

When we originally created csv_summary output format, we didn't have a way to properly check that summarization happened independently on separate queries, so we limited this output to single-query runs.

That issue has long since been fixed, and csv_summary can smoothly output multiple query results, except that we explicitly prohibited it from doing so.

Fix in progress in #3311

bluegenes added a commit that referenced this issue Oct 24, 2024
Currently, when we generate a `csv_summary` with LINs, we get a summary
at every single LIN rank, which is a lot of results and not very
helpful. LINgroups are our way of linking the LINs (e.g.
`14;1;0;0;0;0;0;0;0;0`) to a known name/taxonomic group (e.g. "Phylotype
I").

This PR changes the behavior of `csv_summary` when a `lingroup` file is
provided, limiting summarized reporting to just the named lingroups.
While the output is very similar to the `lingroup` output we already
have, the most important difference is that the sample name is included
in the output, meaning that we get intelligible results when running
`tax metagenome` on more than one sample.

Prior `tax metagenome` behavior was to always generate a `lingroup`
output file when a `lingroups` file is provided. Here, I disable that
for multiple queries, since the results wouldn't make sense. I do not
replace it with another default, but I did add a recommendation to the
help + doc.

In the future, we could consider changing the default `lingroup` output
to `csv_summary`, since it's actually useful for multiple files. Or, we
could modify the `lingroup` output to include query information.

- Also fixes #3315

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant