-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG: when lingroups are provided, use them for csv_summary
#3311
Conversation
for more information, see https://pre-commit.ci
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## latest #3311 +/- ##
==========================================
+ Coverage 86.45% 92.40% +5.94%
==========================================
Files 137 104 -33
Lines 16070 12925 -3145
Branches 2211 2219 +8
==========================================
- Hits 13894 11943 -1951
+ Misses 1869 675 -1194
Partials 307 307
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@sourmash-bio/devs ready for review |
…/sourmash into better-summarized-lingroups
for more information, see https://pre-commit.ci
csv_summary
csv_summary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good - esp appreciate the documentation update.
there's some missing code coverage - is this just buggy codecov? I haven't dug in at all.
oh! I wanted to suggest that you put the suggested changes to behavior in the PR description into new issues, too; I think they require a major version bump? |
looks like it was just buggy codecov! |
now in #3361 |
Currently, when we generate a
csv_summary
with LINs, we get a summary at every single LIN rank, which is a lot of results and not very helpful. LINgroups are our way of linking the LINs (e.g.14;1;0;0;0;0;0;0;0;0
) to a known name/taxonomic group (e.g. "Phylotype I").This PR changes the behavior of
csv_summary
when alingroup
file is provided, limiting summarized reporting to just the named lingroups. While the output is very similar to thelingroup
output we already have, the most important difference is that the sample name is included in the output, meaning that we get intelligible results when runningtax metagenome
on more than one sample.Prior
tax metagenome
behavior was to always generate alingroup
output file when alingroups
file is provided. Here, I disable that for multiple queries, since the results wouldn't make sense. I do not replace it with another default, but I did add a recommendation to the help + doc.In the future, we could consider changing the default
lingroup
output tocsv_summary
, since it's actually useful for multiple files. Or, we could modify thelingroup
output to include query information.csv_summary
format not created for multiple queries (tax metagenome
) #3315