Make it possible to define min/max and disabling values for numeric argument values #143

vdauwera · 2015-01-22T20:03:12Z

This is something we got to only in a rather basic way in GATK but is very useful to enable in order to save users from themselves. This would involve three components:

Hard min/max values that correspond to limits beyond which values could cause errors/program failures; violation should throw a User Exception;
Recommended min/max values that correspond to limits beyond which values do not make sense for a given analysis functionality for standard use cases; violation should log a WARN entry.
Behavior-disabling value if applicable. Let's say we have an argument that provides a threshold for filtering; and it takes min. 4, max. 20. We may want to set it up so that passing -1 disables the behavior controlled by the argument (so in the filtering case, "-1" means "don't filter at all") without tripping the min value check.

These should all be accessible to the GATKDoclet (or equivalent) for documentation purposes.

vruano · 2015-01-22T20:49:51Z

3)... '-1' might not make sense in some cases (perhaps -1 is sometimes a legit value) and it seems that it would need to be handled explicitly by the tool... Perhaps it should be addressed by a '--no' (or '--dont' for verbs) prefix to the argument name. This way this can be handled by the command-line parser in a general way regardless of the argument type and concrete semantics.

--trim-min -1
would be
--no-trim-min

--prune-unused-alleles FALSE
would be
--dont-prune-unused-alleles

Perhaps verbs can be handled as boolean arguments.

vdauwera · 2015-01-22T20:56:27Z

Oh sure, I'm using -1 as an example of value that would be out of the normal range. Could be zero in some cases, could be something else entirely.

The problem with having a separate argument altogether for disabling behaviors is that this decouples the action of setting a value for a behavior and the action of disabling the behavior. That in turn opens avenues for collisions (users specifying both in the same command by accident or ignorance) which you then have to handle explicitly in the argument parsing. Also, the separate arguments end up in different places in the docs, which are typically organized alphabetically. So it creates more complexity than it resolves.

vruano · 2015-01-22T22:50:46Z

mmm... the collision problem is no different to the user trying to set up a single value argument twice (or more times); the command parser can catch this as an attempt to give two values to the same underlaying argument thus wrong.

I would say that a documentation auto generator can handle this (show no/dont option next to the main one).

(Notice the indentation).

--prune-unused-alleles|-pua blah blah blah blah blah blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah
--dont-prune-unused-alleles|--dont-pua ... to explicitly disable this action.

--other-option INT blah blah blah
...

However, I don't see any issue in having two separate entries where the donts and/or dos paraphrase or make reference to each other. At this point the user is looking into a Reference manual which is never going to be as "nice" as a good set of examples with or without dont's.

Finally, I think the --no/--dont has the nice feature that makes the command line more readable and compact ....

e.g.

--dont-trim

vs

--trim-min -1 --trim-max -1

What does -1,-1 mean? need to check the manual?

or vs

--trim NO / DISABLE / FALSE / NONE / -1 / 0

longer (ok ok except NO, -1 or 0), and there is no need to go the reference to find out the disabling value ... --no (for value full options) or --dont (for value less flags) always work so less trips to the Reference.

vdauwera · 2015-01-22T23:45:26Z

I'm not completely opposed to that way of dealing with this, but I'm not yet convinced either.

I'm not sure I see how having an extra argument is somehow shorter than having one special value that is included in the description of the original argument. As in:

--trimWhatever | -trimWvr -- bla bla bla; default w; min x max y; to disable trimming, use z.

As for the documentation auto-generator showing the two args together, that is dependent on setting up the arguments so that the code specifies they are related, and adding some logic to the auto-generation to pull related arguments together. (As a contributing developer to a documentation auto-generator --the GATKDocs-- I can tell you that is not necessarily trivial and adds even more moving parts.) This also generates additional complexity for third-party developers of wrappers (such as Galaxy). Finally, it can be a source of confusion for users who are trying to look up an argument called "-dont-Trim-whatever" since presumably it's only going to be listed under T (-Trim-whatever) and not under D in the alphabetical list. Or should it be listed twice?

A reference manual can be very "nice" and helpful, and it must be organized in the most intuitive way possible, especially since there is no way we can provide examples that cover every single use case under the sun (trust me, there's not).

lbergelson · 2015-01-23T16:05:29Z

Documented Range bounds on numeric checks seem like a good idea. In general I'm not a fan of special values to signal abnormal behavior. A disabling flag seems like a cleaner idea to me. We can enforce them as being mutually exclusive, which also has the effect of documenting them together.

If it really makes sense to have a special value, right now that can just be explained in the doc string.

jmthibault79 · 2015-01-23T17:42:08Z

I also prefer the use of flags over special values, because of the case where the special value is accidentally triggered: maybe that value is obviously out of a reasonable range to us but not an unsophisticated user. I worry about complaints like "I clearly set the value, but it didn't take effect." At the least, there needs to be something like: WARN: disabling trim feature due to parameter value -1

cmnbroad · 2017-01-17T16:09:56Z

Barclay now has implementations for min/max and minRecommended/maxRecommended values for integer/float args (and these are integrated with the help doclet). Once GATK is upgraded to the next Barclay version, we can start using them. i.e., ApplyBQSRArgumentCollection. Also, there are a ton of places in gatk-protected where there is custom validation code that could be replaced with annotations.

droazen · 2017-03-22T17:46:22Z

@cmnbroad Is this one done?

magicDGS · 2017-03-23T06:45:58Z

The max/min/recommended values implementation for integer/float args is already in the barclay version in use for GATK, @droazen. The usage of them I don't know...

droazen · 2017-03-23T15:01:26Z

I'll wait for @cmnbroad 's response as to how much of the original request from @vdauwera is implemented.

cmnbroad · 2017-03-23T16:50:03Z

@droazen We should keep this open - we have the range implementation, but not the generalized "disabling" feature.

lbergelson added the CLI label Feb 11, 2015

akiezun added the Doc label Feb 26, 2015

akiezun added enhancement UserDoc and removed DeveloperDoc labels Apr 15, 2015

akiezun added the PRIORITY_LOW label May 6, 2015

lbergelson mentioned this issue Dec 9, 2015

bad argument value shows up as IllegalArgumentException #1292

Open

cmnbroad self-assigned this Jan 17, 2017

droazen added this to the 4.0 release milestone Mar 22, 2017

droazen removed this from the Engine-4.0 milestone Oct 17, 2017

droazen modified the milestones: Engine-2Q2018, Engine-1Q2018 Feb 8, 2018

cmnbroad mentioned this issue Mar 12, 2018

Factor out ArgumentDefinition classes and refactor CommandLineArgumentParser. broadinstitute/barclay#133

Merged

droazen modified the milestones: Engine-1Q2018, Engine-2Q2018 Apr 6, 2018

droazen removed this from the Engine-2Q2018 milestone Oct 4, 2018

davidbenjamin added Documentation and removed UserDoc labels Feb 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it possible to define min/max and disabling values for numeric argument values #143

Make it possible to define min/max and disabling values for numeric argument values #143

vdauwera commented Jan 22, 2015

vruano commented Jan 22, 2015

vdauwera commented Jan 22, 2015

vruano commented Jan 22, 2015

vdauwera commented Jan 22, 2015

lbergelson commented Jan 23, 2015

jmthibault79 commented Jan 23, 2015

cmnbroad commented Jan 17, 2017

droazen commented Mar 22, 2017

magicDGS commented Mar 23, 2017

droazen commented Mar 23, 2017

cmnbroad commented Mar 23, 2017

Make it possible to define min/max and disabling values for numeric argument values #143

Make it possible to define min/max and disabling values for numeric argument values #143

Comments

vdauwera commented Jan 22, 2015

vruano commented Jan 22, 2015

vdauwera commented Jan 22, 2015

vruano commented Jan 22, 2015

vdauwera commented Jan 22, 2015

lbergelson commented Jan 23, 2015

jmthibault79 commented Jan 23, 2015

cmnbroad commented Jan 17, 2017

droazen commented Mar 22, 2017

magicDGS commented Mar 23, 2017

droazen commented Mar 23, 2017

cmnbroad commented Mar 23, 2017