You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working with raster datasets and employing different rule-based algorithms (namely, CART, Cubist, bagged trees, boosted trees and random forests "RF") for a regression problem (biomass per area).
My initial reasoning was that Cubist would have an optimal prediction performance and require low processing power/time for predictions. The reasons for such should be the low complexity fit between predictors and explained variable.
Result wise, Cubist has performed as well as RF (as per the results of Dunn's Test, using the results of a k-fold repeated cross-validation). M5, on the other hand, is lightning-fast (3 seconds), but not as accurate as RF.
However, and quite surprisingly, Cubist took around one minute, whereas Random Forest needed 19 seconds, to predict the same raster. The same results were reported in this paper: https://doi.org/10.1016/j.neunet.2018.12.010
I would be happy to provide a reprex, if provided a mock-up raster (in which I could perform regression and not classification, although computing time should not be determined by the task). I've seen one of your talks, where you mentioned that Cubist should be faster than Random Forests (provided that is coded in C and is far smaller and optimized, rather than Random Forest).
The size of a Cubist model is around 100kb whereas RF, 5Mb. However, this (in the context where I am working) is not a limiting factor.
Is there something I am doing wrong? I would argue that Cubist should be the work-horse (for tasks such as mine) rather than Random Forest; however, as is, Cubist will be limited by the processing-time
Cheers, Gustavo
PS: I also post this question in StackOverflow, but I reckon it would be useful to have it here, as I am using your package as the basis for these statements.
The text was updated successfully, but these errors were encountered:
Hi @topepo,
I am working with raster datasets and employing different rule-based algorithms (namely, CART, Cubist, bagged trees, boosted trees and random forests "RF") for a regression problem (biomass per area).
My initial reasoning was that Cubist would have an optimal prediction performance and require low processing power/time for predictions. The reasons for such should be the low complexity fit between predictors and explained variable.
Result wise, Cubist has performed as well as RF (as per the results of Dunn's Test, using the results of a k-fold repeated cross-validation). M5, on the other hand, is lightning-fast (3 seconds), but not as accurate as RF.
However, and quite surprisingly, Cubist took around one minute, whereas Random Forest needed 19 seconds, to predict the same raster. The same results were reported in this paper: https://doi.org/10.1016/j.neunet.2018.12.010
I would be happy to provide a reprex, if provided a mock-up raster (in which I could perform regression and not classification, although computing time should not be determined by the task). I've seen one of your talks, where you mentioned that Cubist should be faster than Random Forests (provided that is coded in C and is far smaller and optimized, rather than Random Forest).
The size of a Cubist model is around 100kb whereas RF, 5Mb. However, this (in the context where I am working) is not a limiting factor.
Is there something I am doing wrong? I would argue that Cubist should be the work-horse (for tasks such as mine) rather than Random Forest; however, as is, Cubist will be limited by the processing-time
Cheers, Gustavo
PS: I also post this question in StackOverflow, but I reckon it would be useful to have it here, as I am using your package as the basis for these statements.
The text was updated successfully, but these errors were encountered: