ScalaGBM

Overview

ScalaGBM is an efficient GPU-based GBDT system, which can handle high-dimensional and large-scale dataset and train fast.

Prerequisites

cmake 2.8 or above
gcc 11.x for Linux
CUDA 11.7

Introduction

Download

git clone https://github.com/BoruiXu/ScalaGBM.git

Build on Linux. Before building, it is necessary to set the architecture of the GPU on line 28 (-arch) in CMakeLists.txt. For example, when using Nvidia RTX A6000, -arch=compute_86.

cd ScalaGNM
mkdir build
cd build
cmake ..
make -j

Usage example

./bin/scalagbm-train data=dataset/datasetname objective=binary:logistic tree_method=hist n_trees=40 depth=6

Datasets

All test datasts can be downloaded through the script in dataset floader.

sh ./dataset/get_datasets.sh

Parameter and Test

The meaning of parameters is the same as that in ThunderGBM. At present, only histogram-based training method is supported. We provide a bash script (train_test.sh) to train datasets mentioned in our paper. Befor running this script, please copy this script into the build floder. If you want to test the real-sim dataset. Please run:

sh train_test.sh real-sim

NOTED: all datasets need to be stored in the dataset folder!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ScalaGBM

Overview

Prerequisites

Introduction

Download

Usage example

Datasets

Parameter and Test

Files

README.md

Latest commit

History

README.md

File metadata and controls

ScalaGBM

Overview

Prerequisites

Introduction

Download

Usage example

Datasets

Parameter and Test