Skip to content

This repo implements an efficient GPU-based GBDT system. It can handle high-dimensional data on GPU and train fast.

Notifications You must be signed in to change notification settings

BoruiXu/ScalaGBM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ScalaGBM

Overview

ScalaGBM is an efficient GPU-based GBDT system, which can handle high-dimensional and large-scale dataset and train fast.

Prerequisites

  • cmake 2.8 or above
  • gcc 11.x for Linux
  • CUDA 11.7

Introduction

Download

git clone https://github.com/BoruiXu/ScalaGBM.git

Build on Linux. Before building, it is necessary to set the architecture of the GPU on line 28 (-arch) in CMakeLists.txt. For example, when using Nvidia RTX A6000, -arch=compute_86.

cd ScalaGNM
mkdir build
cd build
cmake ..
make -j

Usage example

./bin/scalagbm-train data=dataset/datasetname objective=binary:logistic tree_method=hist n_trees=40 depth=6

Datasets

All test datasts can be downloaded through the script in dataset floader.

sh ./dataset/get_datasets.sh

Parameter and Test

The meaning of parameters is the same as that in ThunderGBM. At present, only histogram-based training method is supported. We provide a bash script (train_test.sh) to train datasets mentioned in our paper. Befor running this script, please copy this script into the build floder. If you want to test the real-sim dataset. Please run:

sh train_test.sh real-sim

NOTED: all datasets need to be stored in the dataset folder!

About

This repo implements an efficient GPU-based GBDT system. It can handle high-dimensional data on GPU and train fast.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages