BreakNBuild
is designed to evaluate model performance through progressively sampled training data. It offers a structured way to analyze how a model’s accuracy, error, or other metrics evolve as the amount of data increases. This iterative sampling approach is particularly useful for identifying bias-variance trade-offs, diagnosing overfitting or underfitting, and understanding how much data is needed to achieve optimal model performance. With BreakNBuild, users can visualize learning curves, helping to fine-tune algorithms, assess generalization, and debug machine learning models efficiently.
- Progressive Data Splitting: partition your dataset into training and validation subsets.
- Customizable Sample Sizes: Control the size of your training data to understand model performance under different conditions.
- Easy Integration: Built on the
rsample
package,BreakNBuild
seamlessly integrates with thetidymodels
framework.
![man/figures/schema_progressive_splits.svg]
To install the latest version from GitHub, use:
# install.packages("devtools")
devtools::install_github("https://github.com/focardozom/BreakNBuild")
Here's a quick example to get you started:
library(BreakNBuild)
splits <- progressive_splits(data, validation_size = 0.2, start_size = 10)
This will create a splits object that you can use to train your model using the tidymodels
ecosystem for Machine Learning.
For more details on how to use the BreakNBuild
package, please refer to the package vignette.