Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to define min/max and disabling values for numeric argument values #143

Open
vdauwera opened this issue Jan 22, 2015 · 11 comments

Comments

@vdauwera
Copy link
Contributor

This is something we got to only in a rather basic way in GATK but is very useful to enable in order to save users from themselves. This would involve three components:

  1. Hard min/max values that correspond to limits beyond which values could cause errors/program failures; violation should throw a User Exception;

  2. Recommended min/max values that correspond to limits beyond which values do not make sense for a given analysis functionality for standard use cases; violation should log a WARN entry.

  3. Behavior-disabling value if applicable. Let's say we have an argument that provides a threshold for filtering; and it takes min. 4, max. 20. We may want to set it up so that passing -1 disables the behavior controlled by the argument (so in the filtering case, "-1" means "don't filter at all") without tripping the min value check.

These should all be accessible to the GATKDoclet (or equivalent) for documentation purposes.

@vruano
Copy link
Contributor

vruano commented Jan 22, 2015

3)... '-1' might not make sense in some cases (perhaps -1 is sometimes a legit value) and it seems that it would need to be handled explicitly by the tool... Perhaps it should be addressed by a '--no' (or '--dont' for verbs) prefix to the argument name. This way this can be handled by the command-line parser in a general way regardless of the argument type and concrete semantics.

--trim-min -1
would be
--no-trim-min

--prune-unused-alleles FALSE
would be
--dont-prune-unused-alleles

Perhaps verbs can be handled as boolean arguments.

@vdauwera
Copy link
Contributor Author

Oh sure, I'm using -1 as an example of value that would be out of the normal range. Could be zero in some cases, could be something else entirely.

The problem with having a separate argument altogether for disabling behaviors is that this decouples the action of setting a value for a behavior and the action of disabling the behavior. That in turn opens avenues for collisions (users specifying both in the same command by accident or ignorance) which you then have to handle explicitly in the argument parsing. Also, the separate arguments end up in different places in the docs, which are typically organized alphabetically. So it creates more complexity than it resolves.

@vruano
Copy link
Contributor

vruano commented Jan 22, 2015

mmm... the collision problem is no different to the user trying to set up a single value argument twice (or more times); the command parser can catch this as an attempt to give two values to the same underlaying argument thus wrong.

I would say that a documentation auto generator can handle this (show no/dont option next to the main one).

(Notice the indentation).

--prune-unused-alleles|-pua blah blah blah blah blah blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah
--dont-prune-unused-alleles|--dont-pua ... to explicitly disable this action.

--other-option INT blah blah blah
...

However, I don't see any issue in having two separate entries where the donts and/or dos paraphrase or make reference to each other. At this point the user is looking into a Reference manual which is never going to be as "nice" as a good set of examples with or without dont's.

Finally, I think the --no/--dont has the nice feature that makes the command line more readable and compact ....

e.g.

--dont-trim

vs

--trim-min -1 --trim-max -1

What does -1,-1 mean? need to check the manual?

or vs

--trim NO / DISABLE / FALSE / NONE / -1 / 0

longer (ok ok except NO, -1 or 0), and there is no need to go the reference to find out the disabling value ... --no (for value full options) or --dont (for value less flags) always work so less trips to the Reference.

@vdauwera
Copy link
Contributor Author

I'm not completely opposed to that way of dealing with this, but I'm not yet convinced either.

I'm not sure I see how having an extra argument is somehow shorter than having one special value that is included in the description of the original argument. As in:

--trimWhatever | -trimWvr -- bla bla bla; default w; min x max y; to disable trimming, use z.

As for the documentation auto-generator showing the two args together, that is dependent on setting up the arguments so that the code specifies they are related, and adding some logic to the auto-generation to pull related arguments together. (As a contributing developer to a documentation auto-generator --the GATKDocs-- I can tell you that is not necessarily trivial and adds even more moving parts.) This also generates additional complexity for third-party developers of wrappers (such as Galaxy). Finally, it can be a source of confusion for users who are trying to look up an argument called "-dont-Trim-whatever" since presumably it's only going to be listed under T (-Trim-whatever) and not under D in the alphabetical list. Or should it be listed twice?

A reference manual can be very "nice" and helpful, and it must be organized in the most intuitive way possible, especially since there is no way we can provide examples that cover every single use case under the sun (trust me, there's not).

@lbergelson
Copy link
Member

Documented Range bounds on numeric checks seem like a good idea. In general I'm not a fan of special values to signal abnormal behavior. A disabling flag seems like a cleaner idea to me. We can enforce them as being mutually exclusive, which also has the effect of documenting them together.

If it really makes sense to have a special value, right now that can just be explained in the doc string.

@jmthibault79
Copy link
Contributor

I also prefer the use of flags over special values, because of the case where the special value is accidentally triggered: maybe that value is obviously out of a reasonable range to us but not an unsophisticated user. I worry about complaints like "I clearly set the value, but it didn't take effect." At the least, there needs to be something like: WARN: disabling trim feature due to parameter value -1

@cmnbroad
Copy link
Collaborator

Barclay now has implementations for min/max and minRecommended/maxRecommended values for integer/float args (and these are integrated with the help doclet). Once GATK is upgraded to the next Barclay version, we can start using them. i.e., ApplyBQSRArgumentCollection. Also, there are a ton of places in gatk-protected where there is custom validation code that could be replaced with annotations.

@cmnbroad cmnbroad self-assigned this Jan 17, 2017
@droazen
Copy link
Contributor

droazen commented Mar 22, 2017

@cmnbroad Is this one done?

@droazen droazen added this to the 4.0 release milestone Mar 22, 2017
@magicDGS
Copy link
Contributor

The max/min/recommended values implementation for integer/float args is already in the barclay version in use for GATK, @droazen. The usage of them I don't know...

@droazen
Copy link
Contributor

droazen commented Mar 23, 2017

I'll wait for @cmnbroad 's response as to how much of the original request from @vdauwera is implemented.

@cmnbroad
Copy link
Collaborator

@droazen We should keep this open - we have the range implementation, but not the generalized "disabling" feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants