-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it possible to define min/max and disabling values for numeric argument values #143
Comments
3)... '-1' might not make sense in some cases (perhaps -1 is sometimes a legit value) and it seems that it would need to be handled explicitly by the tool... Perhaps it should be addressed by a '--no' (or '--dont' for verbs) prefix to the argument name. This way this can be handled by the command-line parser in a general way regardless of the argument type and concrete semantics. --trim-min -1 --prune-unused-alleles FALSE Perhaps verbs can be handled as boolean arguments. |
Oh sure, I'm using -1 as an example of value that would be out of the normal range. Could be zero in some cases, could be something else entirely. The problem with having a separate argument altogether for disabling behaviors is that this decouples the action of setting a value for a behavior and the action of disabling the behavior. That in turn opens avenues for collisions (users specifying both in the same command by accident or ignorance) which you then have to handle explicitly in the argument parsing. Also, the separate arguments end up in different places in the docs, which are typically organized alphabetically. So it creates more complexity than it resolves. |
mmm... the collision problem is no different to the user trying to set up a single value argument twice (or more times); the command parser can catch this as an attempt to give two values to the same underlaying argument thus wrong. I would say that a documentation auto generator can handle this (show no/dont option next to the main one). (Notice the indentation). --prune-unused-alleles|-pua blah blah blah blah blah blah blah blah blah blah blah blah blah --other-option INT blah blah blah However, I don't see any issue in having two separate entries where the donts and/or dos paraphrase or make reference to each other. At this point the user is looking into a Reference manual which is never going to be as "nice" as a good set of examples with or without dont's. Finally, I think the --no/--dont has the nice feature that makes the command line more readable and compact .... e.g. --dont-trim vs --trim-min -1 --trim-max -1 What does -1,-1 mean? need to check the manual? or vs --trim NO / DISABLE / FALSE / NONE / -1 / 0 longer (ok ok except NO, -1 or 0), and there is no need to go the reference to find out the disabling value ... --no (for value full options) or --dont (for value less flags) always work so less trips to the Reference. |
I'm not completely opposed to that way of dealing with this, but I'm not yet convinced either. I'm not sure I see how having an extra argument is somehow shorter than having one special value that is included in the description of the original argument. As in: --trimWhatever | -trimWvr -- bla bla bla; default w; min x max y; to disable trimming, use z. As for the documentation auto-generator showing the two args together, that is dependent on setting up the arguments so that the code specifies they are related, and adding some logic to the auto-generation to pull related arguments together. (As a contributing developer to a documentation auto-generator --the GATKDocs-- I can tell you that is not necessarily trivial and adds even more moving parts.) This also generates additional complexity for third-party developers of wrappers (such as Galaxy). Finally, it can be a source of confusion for users who are trying to look up an argument called "-dont-Trim-whatever" since presumably it's only going to be listed under T (-Trim-whatever) and not under D in the alphabetical list. Or should it be listed twice? A reference manual can be very "nice" and helpful, and it must be organized in the most intuitive way possible, especially since there is no way we can provide examples that cover every single use case under the sun (trust me, there's not). |
Documented Range bounds on numeric checks seem like a good idea. In general I'm not a fan of special values to signal abnormal behavior. A disabling flag seems like a cleaner idea to me. We can enforce them as being mutually exclusive, which also has the effect of documenting them together. If it really makes sense to have a special value, right now that can just be explained in the doc string. |
I also prefer the use of flags over special values, because of the case where the special value is accidentally triggered: maybe that value is obviously out of a reasonable range to us but not an unsophisticated user. I worry about complaints like "I clearly set the value, but it didn't take effect." At the least, there needs to be something like: |
Barclay now has implementations for min/max and minRecommended/maxRecommended values for integer/float args (and these are integrated with the help doclet). Once GATK is upgraded to the next Barclay version, we can start using them. i.e., ApplyBQSRArgumentCollection. Also, there are a ton of places in gatk-protected where there is custom validation code that could be replaced with annotations. |
@cmnbroad Is this one done? |
The max/min/recommended values implementation for integer/float args is already in the barclay version in use for GATK, @droazen. The usage of them I don't know... |
@droazen We should keep this open - we have the range implementation, but not the generalized "disabling" feature. |
This is something we got to only in a rather basic way in GATK but is very useful to enable in order to save users from themselves. This would involve three components:
Hard min/max values that correspond to limits beyond which values could cause errors/program failures; violation should throw a User Exception;
Recommended min/max values that correspond to limits beyond which values do not make sense for a given analysis functionality for standard use cases; violation should log a WARN entry.
Behavior-disabling value if applicable. Let's say we have an argument that provides a threshold for filtering; and it takes min. 4, max. 20. We may want to set it up so that passing -1 disables the behavior controlled by the argument (so in the filtering case, "-1" means "don't filter at all") without tripping the min value check.
These should all be accessible to the GATKDoclet (or equivalent) for documentation purposes.
The text was updated successfully, but these errors were encountered: