Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rules are not working properly in cubistControl? #16

Open
tag-dad opened this issue Dec 18, 2017 · 1 comment
Open

Rules are not working properly in cubistControl? #16

tag-dad opened this issue Dec 18, 2017 · 1 comment

Comments

@tag-dad
Copy link

tag-dad commented Dec 18, 2017

I have a data set of 11 variables, including one outcome variable (ytr) and 10 predictors (xtr). There are 368 observations. I am running the following simple code:

cc <- cubistControl(rules = 100, extrapolation = 5)
or
cc <- cubistControl(rules = 50, extrapolation = 5)

tuned <- cubist(x = xtr, y = ytr,
committees = 1,
neighbors = 0,
control = cc)

On this data set, I get 7 rules if I set the rules to be 100, but 5 rules with the rules of 50. So, the only change that I make here is the number of rules in cubistControl(). As far as I understand this parameter limit the maximum rules that cubist will identify. So, this is somehow strange to me.

To be honest, I am not even sure if this issue is a bug or not because I have not checked with other data sets. But I feel like to report it to you.

I am sorry I can't post the exact data set here. But please let me know if you want to take a look into this then I will email you the data and script. If this is not a bug and can happen, please correct me. Thanks!

@tag-dad tag-dad changed the title Rules are not working in cubistControl? Rules are not working properly in cubistControl? Dec 18, 2017
@topepo
Copy link
Owner

topepo commented Dec 18, 2017

Checking the RuleQuest website, it says that:

The complexity of a model can be controlled by restricting the number of rules that it may contain (the default value being 500 rules). The option -r rules sets the maximum number of rules that may be used in a model.

so your understanding is correct.

I think that the difference in the number of rules that you show is related to the pruning process. I would have to look at the source code in detail but my guess that the pruning process takes into account the number of rules that it starts with (directly or indirectly). Unfortunately, there is no verbose option to get a better understanding.

If you want a definitive answer, I suggest that you dump the data into the formats needed by the command-line version (see the R functions makeDataFile and makeNamesFile) and verify this in the command line version. At that point, you could email Quinlan and ask. He says that the GPL versions are unsupported but I don't think that the compiled versions on the downloads page fall under that license. He's a nice guy so you have a good chance at getting a response.

If you do get an answer, let me know and we can put it in the documentation for cubistControl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants