You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a data set of 11 variables, including one outcome variable (ytr) and 10 predictors (xtr). There are 368 observations. I am running the following simple code:
cc <- cubistControl(rules = 100, extrapolation = 5)
or
cc <- cubistControl(rules = 50, extrapolation = 5)
tuned <- cubist(x = xtr, y = ytr,
committees = 1,
neighbors = 0,
control = cc)
On this data set, I get 7 rules if I set the rules to be 100, but 5 rules with the rules of 50. So, the only change that I make here is the number of rules in cubistControl(). As far as I understand this parameter limit the maximum rules that cubist will identify. So, this is somehow strange to me.
To be honest, I am not even sure if this issue is a bug or not because I have not checked with other data sets. But I feel like to report it to you.
I am sorry I can't post the exact data set here. But please let me know if you want to take a look into this then I will email you the data and script. If this is not a bug and can happen, please correct me. Thanks!
The text was updated successfully, but these errors were encountered:
tag-dad
changed the title
Rules are not working in cubistControl?
Rules are not working properly in cubistControl?
Dec 18, 2017
The complexity of a model can be controlled by restricting the number of rules that it may contain (the default value being 500 rules). The option -r rules sets the maximum number of rules that may be used in a model.
so your understanding is correct.
I think that the difference in the number of rules that you show is related to the pruning process. I would have to look at the source code in detail but my guess that the pruning process takes into account the number of rules that it starts with (directly or indirectly). Unfortunately, there is no verbose option to get a better understanding.
If you want a definitive answer, I suggest that you dump the data into the formats needed by the command-line version (see the R functions makeDataFile and makeNamesFile) and verify this in the command line version. At that point, you could email Quinlan and ask. He says that the GPL versions are unsupported but I don't think that the compiled versions on the downloads page fall under that license. He's a nice guy so you have a good chance at getting a response.
If you do get an answer, let me know and we can put it in the documentation for cubistControl.
I have a data set of 11 variables, including one outcome variable (ytr) and 10 predictors (xtr). There are 368 observations. I am running the following simple code:
cc <- cubistControl(rules = 100, extrapolation = 5)
or
cc <- cubistControl(rules = 50, extrapolation = 5)
tuned <- cubist(x = xtr, y = ytr,
committees = 1,
neighbors = 0,
control = cc)
On this data set, I get 7 rules if I set the rules to be 100, but 5 rules with the rules of 50. So, the only change that I make here is the number of rules in cubistControl(). As far as I understand this parameter limit the maximum rules that cubist will identify. So, this is somehow strange to me.
To be honest, I am not even sure if this issue is a bug or not because I have not checked with other data sets. But I feel like to report it to you.
I am sorry I can't post the exact data set here. But please let me know if you want to take a look into this then I will email you the data and script. If this is not a bug and can happen, please correct me. Thanks!
The text was updated successfully, but these errors were encountered: