Cubist #28

gtalckmin · 2020-11-10T18:58:54Z

I am working with raster datasets and employing different rule-based algorithms (namely, CART, Cubist, bagged trees, boosted trees and random forests "RF") for a regression problem (biomass per area).

My initial reasoning was that Cubist would have an optimal prediction performance and require low processing power/time for predictions. The reasons for such should be the low complexity fit between predictors and explained variable.

Result wise, Cubist has performed as well as RF (as per the results of Dunn's Test, using the results of a k-fold repeated cross-validation). M5, on the other hand, is lightning-fast (3 seconds), but not as accurate as RF.

However, and quite surprisingly, Cubist took around one minute, whereas Random Forest needed 19 seconds, to predict the same raster. The same results were reported in this paper: https://doi.org/10.1016/j.neunet.2018.12.010

I would be happy to provide a reprex, if provided a mock-up raster (in which I could perform regression and not classification, although computing time should not be determined by the task). I've seen one of your talks, where you mentioned that Cubist should be faster than Random Forests (provided that is coded in C and is far smaller and optimized, rather than Random Forest).

The size of a Cubist model is around 100kb whereas RF, 5Mb. However, this (in the context where I am working) is not a limiting factor.

Is there something I am doing wrong? I would argue that Cubist should be the work-horse (for tasks such as mine) rather than Random Forest; however, as is, Cubist will be limited by the processing-time

Cheers, Gustavo
PS: I also post this question in StackOverflow, but I reckon it would be useful to have it here, as I am using your package as the basis for these statements.

pjaselin mentioned this issue Dec 27, 2021

Composite option, dataString compression, and correct splits percentage #40

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cubist #28

Cubist #28

gtalckmin commented Nov 10, 2020

Cubist #28

Cubist #28

Comments

gtalckmin commented Nov 10, 2020