Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -291,9 +291,9 @@ if (! hasArg --configure-only) && (completeBuild || hasArg libnvforest); then
fi
fi
MSG="${MSG}<br/>parallel setting: $PARALLEL_LEVEL"
if [[ -f "${LIBNVFOREST_BUILD_DIR}/libnvforest++.so" ]]; then
LIBNVFOREST_FS=$(find "${LIBNVFOREST_BUILD_DIR}" -name libnvforest++.so -printf '%s'| awk '{printf "%.2f MB", $1/1024/1024}')
MSG="${MSG}<br/>libnvforest++.so size: $LIBNVFOREST_FS"
if [[ -f "${LIBNVFOREST_BUILD_DIR}/libnvforest.so" ]]; then
LIBNVFOREST_FS=$(find "${LIBNVFOREST_BUILD_DIR}" -name libnvforest.so -printf '%s'| awk '{printf "%.2f MB", $1/1024/1024}')
MSG="${MSG}<br/>libnvforest.so size: $LIBNVFOREST_FS"
fi
BMR_DIR=${RAPIDS_ARTIFACTS_DIR:-"${LIBNVFOREST_BUILD_DIR}"}
echo "The HTML report can be found at [${BMR_DIR}/ninja_log.html]. In CI, this report"
Expand Down
2 changes: 1 addition & 1 deletion ci/build_wheel_nvforest.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ LIBNVFOREST_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="libnvforest_${RAPIDS_PY_CUDA_SUFF
echo "libnvforest-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo "${LIBNVFOREST_WHEELHOUSE}"/libnvforest_*.whl)" >> "${PIP_CONSTRAINT}"

EXCLUDE_ARGS=(
--exclude "libnvforest++.so"
--exclude "libnvforest.so"
--exclude "libraft.so"
--exclude "libcublas.so.*"
--exclude "libcublasLt.so.*"
Expand Down
2 changes: 1 addition & 1 deletion conda/recipes/libnvforest/recipe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ outputs:
prefix_detection:
ignore:
# See https://github.com/rapidsai/build-planning/issues/160
- lib/libnvforest++.so
- lib/libnvforest.so
string: cuda${{ cuda_major }}_${{ date_string }}_${{ head_rev }}
requirements:
build:
Expand Down
6 changes: 3 additions & 3 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ option(NVTX "Enable nvtx markers" OFF)
option(USE_CCACHE "Cache build artifacts with ccache" OFF)
option(NVFOREST_USE_RAFT_STATIC "Build and statically link the RAFT library" OFF)
option(NVFOREST_USE_TREELITE_STATIC "Build and statically link the treelite library" OFF)
option(NVFOREST_EXPORT_TREELITE_LINKAGE "Whether to publicly or privately link treelite to libnvforest++" OFF)
option(NVFOREST_EXPORT_TREELITE_LINKAGE "Whether to publicly or privately link treelite to libnvforest" OFF)
option(CUDA_WARNINGS_AS_ERRORS "Enable -Werror=all-warnings for all CUDA compilation" ON)

# The options below allow incorporating libnvforest into another build process without installing all its components.
Expand Down Expand Up @@ -123,7 +123,7 @@ endif()
# ######################################################################################################################
# * Target names -------------------------------------------------------------

set(NVFOREST_CPP_TARGET "nvforest++")
set(NVFOREST_CPP_TARGET "nvforest")
Comment thread
hcho3 marked this conversation as resolved.

# ######################################################################################################################
# * Conda environment detection ----------------------------------------------
Expand Down Expand Up @@ -193,7 +193,7 @@ if(BUILD_NVFOREST_TESTS)
endif()

# ######################################################################################################################
# * build libnvforest++ shared library -------------------------------------------
# * build libnvforest shared library -------------------------------------------

file(
WRITE "${CMAKE_CURRENT_BINARY_DIR}/fatbin.ld"
Expand Down
19 changes: 4 additions & 15 deletions cpp/include/nvforest/Implementation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,6 @@ does *not* require nvcc, CUDA or any other GPU-related library for its CPU-only
build, we also go over general strategies for CPU/GPU interoperability as used
by nvForest.

**A NOTE ON THE `raft_proto` NAMESPACE:** In addition to nvForest-specific code, the new
implementation requires some more general-purpose CPU-GPU interoperable
utilities. Many of these utilities are either already implemented in RAFT (but
do not provide the required CPU-interoperable compilation guarantees) or are a
natural fit for incorporation in RAFT. In order to allow for more careful
integration with the existing RAFT codebase and interoperability
strategies, these utilities are currently provided in the `raft_proto`
namespace but will be moved into RAFT over time. Other algorithms should
not make use of the `raft_proto` namespace but instead wait until this
transition has taken place.

## Design Goals
1. Provide state-of-the-art runtime performance for forest models on GPU,
especially for cases where CPU performance will not suffice (e.g. large
Expand All @@ -43,7 +32,7 @@ codebase.

It is also occasionally useful to make use of a `constexpr` value
indicating whether or not `NVFOREST_ENABLE_GPU` is set, which we introduce as
`raft_proto::GPU_ENABLED`.
`nvforest::detail::GPU_ENABLED`.

### Avoiding CUDA symbols in CPU-only builds
The most significant challenge of attempting to create a unified CPU/GPU
Expand Down Expand Up @@ -88,7 +77,7 @@ between GPU and CPU.
Where we _need_ to provide distinct logic between GPU and CPU
implementations, we do so in implementation headers. In `infer/cpu.hpp`, we
have a fully-defined template for CPU specializations of
`detail::inference::infer`. If `raft_proto::GPU_ENABLED` is `false`, we also
`detail::inference::infer`. If `nvforest::detail::GPU_ENABLED` is `false`, we also
include the GPU specializations, which will simply throw an exception if
invoked. In `infer/gpu.hpp` we *declare* but do not *define* the GPU
specializations. In `infer/gpu.cuh` we provide the full working definition for
Expand Down Expand Up @@ -158,8 +147,8 @@ a standard benchmark) on the CPU.

With some motivation for the general approach to CPU-GPU interoperability, we
now offer an overview of the layout of the codebase to help guide future
improvements. Because `raft_proto` utilities are going to be moved to RAFT or other
general-purpose libraries, we will not review anything within the `raft_proto`
improvements. Because `nvforest::detail` utilities are going to be moved to RAFT or other
general-purpose libraries, we will not review anything within the `nvforest::detail`
directory here.

### Public Headers
Expand Down
25 changes: 9 additions & 16 deletions cpp/include/nvforest/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,6 @@ available in the top-level include directory. The `detail` directory
contains implementation details that are not required to use nvForest and which
will certainly change over time.

**A NOTE ON THE `raft_proto` NAMESPACE:** For the first iteration of this nvForest
implementation, much of the more general-purpose CPU-GPU interoperable code
has temporarily been put in the `raft_proto` namespace. As the name suggests,
the intention is that most or all of this functionality will either be moved
to RAFT or that RAFT features will be updated to provide CPU-GPU
compatible versions of the same.

### Importing a model
nvForest uses Treelite as a common translation layer for all its input types.
To load a forest model, we first create a Treelite model handle as
Expand All @@ -50,7 +43,7 @@ auto nvforest_model = import_from_treelite_model(
tree_layout::depth_first, // layout
128u, // align_bytes
false, // use_double_precision
raft_proto::device_type::gpu, // mem_type
nvforest::device_type::gpu, // mem_type
0, // device_id
stream // CUDA stream
);
Expand All @@ -74,17 +67,17 @@ serialization format will be used. Otherwise, the model will be evaluated
at double precision if this value is set to `true` or single precision if this
value is set to `false`.

**dev_type**: This argument controls where the model will be executed. If `raft_proto::device_type::gpu`, then it will be executed on GPU. If `raft_proto::device_type::cpu`, then it will be executed on CPU.
**dev_type**: This argument controls where the model will be executed. If `nvforest::device_type::gpu`, then it will be executed on GPU. If `nvforest::device_type::cpu`, then it will be executed on CPU.

**device_id**: This integer indicates the ID of the GPU which should be used.
If CPU is being used, this argument is ignored.

**stream**: The CUDA stream which will be used for the actual model import.
If CPU is being used, this argument is ignored. Note that you do *not* need
CUDA headers if you are working with a CPU-only build of nvForest. This
argument uses a `raft_proto::cuda_stream` type which evaluates to a
argument uses a `nvforest::cuda_stream` type which evaluates to a
placeholder type in CPU-only builds. For applications which themselves want to
implement CPU-GPU interoperable builds, the `raft_proto::cuda_stream` type can be
implement CPU-GPU interoperable builds, the `nvforest::cuda_stream` type can be
used directly.


Expand All @@ -106,24 +99,24 @@ cudaMalloc((void**)&output, num_rows * num_outputs * sizeof(float));

// Assuming that input is a float* pointing to data already located on-device

auto handle = raft_proto::handle_t{};
auto handle = nvforest::handle_t{};

nvforest_model.predict(
handle,
output,
input,
num_rows,
raft_proto::device_type::gpu, // out_mem_type
raft_proto::device_type::gpu, // in_mem_type
nvforest::device_type::gpu, // out_mem_type
nvforest::device_type::gpu, // in_mem_type
4 // chunk_size
);
```

**handle**: To provide a unified interface on CPU and GPU, we introduce
`raft_proto::handle_t` as a wrapper for `raft::handle_t`. This is currently just a
`nvforest::handle_t` as a wrapper for `raft::handle_t`. This is currently just a
placeholder in CPU-only builds, and using it does not require any CUDA
functionality. For GPU-enabled builds, you can construct a
`raft_proto_handle_t` directly from the `raft::handle_t` you wish to use.
`nvforest::handle_t` directly from the `raft::handle_t` you wish to use.

**output**: Pointer to pre-allocated buffer where results should be
written. If the model has been loaded at single precision, this should be a
Expand Down
Loading
Loading