rapidsai · hcho3 · May 22, 2026 · May 25, 2026 · May 25, 2026 · May 25, 2026
@@ -291,9 +291,9 @@ if (! hasArg --configure-only) && (completeBuild || hasArg libnvforest); then
           fi
         fi
         MSG="${MSG}<br/>parallel setting: $PARALLEL_LEVEL"
-        if [[ -f "${LIBNVFOREST_BUILD_DIR}/libnvforest++.so" ]]; then
-            LIBNVFOREST_FS=$(find "${LIBNVFOREST_BUILD_DIR}" -name libnvforest++.so -printf '%s'| awk '{printf "%.2f MB", $1/1024/1024}')
-            MSG="${MSG}<br/>libnvforest++.so size: $LIBNVFOREST_FS"
+        if [[ -f "${LIBNVFOREST_BUILD_DIR}/libnvforest.so" ]]; then
+            LIBNVFOREST_FS=$(find "${LIBNVFOREST_BUILD_DIR}" -name libnvforest.so -printf '%s'| awk '{printf "%.2f MB", $1/1024/1024}')
+            MSG="${MSG}<br/>libnvforest.so size: $LIBNVFOREST_FS"
         fi
         BMR_DIR=${RAPIDS_ARTIFACTS_DIR:-"${LIBNVFOREST_BUILD_DIR}"}
         echo "The HTML report can be found at [${BMR_DIR}/ninja_log.html]. In CI, this report"

@@ -20,7 +20,7 @@ LIBNVFOREST_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="libnvforest_${RAPIDS_PY_CUDA_SUFF
 echo "libnvforest-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo "${LIBNVFOREST_WHEELHOUSE}"/libnvforest_*.whl)" >> "${PIP_CONSTRAINT}"
 
 EXCLUDE_ARGS=(
-  --exclude "libnvforest++.so"
+  --exclude "libnvforest.so"
   --exclude "libraft.so"
   --exclude "libcublas.so.*"
   --exclude "libcublasLt.so.*"

@@ -91,7 +91,7 @@ outputs:
       prefix_detection:
         ignore:
           # See https://github.com/rapidsai/build-planning/issues/160
-          - lib/libnvforest++.so
+          - lib/libnvforest.so
       string: cuda${{ cuda_major }}_${{ date_string }}_${{ head_rev }}
     requirements:
       build:

@@ -60,7 +60,7 @@ option(NVTX "Enable nvtx markers" OFF)
 option(USE_CCACHE "Cache build artifacts with ccache" OFF)
 option(NVFOREST_USE_RAFT_STATIC "Build and statically link the RAFT library" OFF)
 option(NVFOREST_USE_TREELITE_STATIC "Build and statically link the treelite library" OFF)
-option(NVFOREST_EXPORT_TREELITE_LINKAGE "Whether to publicly or privately link treelite to libnvforest++" OFF)
+option(NVFOREST_EXPORT_TREELITE_LINKAGE "Whether to publicly or privately link treelite to libnvforest" OFF)
 option(CUDA_WARNINGS_AS_ERRORS "Enable -Werror=all-warnings for all CUDA compilation" ON)
 
 # The options below allow incorporating libnvforest into another build process without installing all its components.
@@ -123,7 +123,7 @@ endif()
 # ######################################################################################################################
 # * Target names -------------------------------------------------------------
 
-set(NVFOREST_CPP_TARGET "nvforest++")
+set(NVFOREST_CPP_TARGET "nvforest")
 
 # ######################################################################################################################
 # * Conda environment detection ----------------------------------------------
@@ -193,7 +193,7 @@ if(BUILD_NVFOREST_TESTS)
 endif()
 
 # ######################################################################################################################
-# * build libnvforest++ shared library -------------------------------------------
+# * build libnvforest shared library -------------------------------------------
 
 file(
   WRITE "${CMAKE_CURRENT_BINARY_DIR}/fatbin.ld"

@@ -6,17 +6,6 @@ does *not* require nvcc, CUDA or any other GPU-related library for its CPU-only
 build, we also go over general strategies for CPU/GPU interoperability as used
 by nvForest.
 
-**A NOTE ON THE `raft_proto` NAMESPACE:** In addition to nvForest-specific code, the new
-implementation requires some more general-purpose CPU-GPU interoperable
-utilities. Many of these utilities are either already implemented in RAFT (but
-do not provide the required CPU-interoperable compilation guarantees) or are a
-natural fit for incorporation in RAFT. In order to allow for more careful
-integration with the existing RAFT codebase and interoperability
-strategies, these utilities are currently provided in the `raft_proto`
-namespace but will be moved into RAFT over time. Other algorithms should
-not make use of the `raft_proto` namespace but instead wait until this
-transition has taken place.
-
 ## Design Goals
 1. Provide state-of-the-art runtime performance for forest models on GPU,
    especially for cases where CPU performance will not suffice (e.g. large
@@ -43,7 +32,7 @@ codebase.
 
 It is also occasionally useful to make use of a `constexpr` value
 indicating whether or not `NVFOREST_ENABLE_GPU` is set, which we introduce as
-`raft_proto::GPU_ENABLED`.
+`nvforest::detail::GPU_ENABLED`.
 
 ### Avoiding CUDA symbols in CPU-only builds
 The most significant challenge of attempting to create a unified CPU/GPU
@@ -88,7 +77,7 @@ between GPU and CPU.
 Where we _need_ to provide distinct logic between GPU and CPU
 implementations, we do so in implementation headers. In `infer/cpu.hpp`, we
 have a fully-defined template for CPU specializations of
-`detail::inference::infer`. If `raft_proto::GPU_ENABLED` is `false`, we also
+`detail::inference::infer`. If `nvforest::detail::GPU_ENABLED` is `false`, we also
 include the GPU specializations, which will simply throw an exception if
 invoked. In `infer/gpu.hpp` we *declare* but do not *define* the GPU
 specializations. In `infer/gpu.cuh` we provide the full working definition for
@@ -158,8 +147,8 @@ a standard benchmark) on the CPU.
 
 With some motivation for the general approach to CPU-GPU interoperability, we
 now offer an overview of the layout of the codebase to help guide future
-improvements. Because `raft_proto` utilities are going to be moved to RAFT or other
-general-purpose libraries, we will not review anything within the `raft_proto`
+improvements. Because `nvforest::detail` utilities are going to be moved to RAFT or other
+general-purpose libraries, we will not review anything within the `nvforest::detail`
 directory here.
 
 ### Public Headers

@@ -19,13 +19,6 @@ available in the top-level include directory. The `detail` directory
 contains implementation details that are not required to use nvForest and which
 will certainly change over time.
 
-**A NOTE ON THE `raft_proto` NAMESPACE:** For the first iteration of this nvForest
-implementation, much of the more general-purpose CPU-GPU interoperable code
-has temporarily been put in the `raft_proto` namespace. As the name suggests,
-the intention is that most or all of this functionality will either be moved
-to RAFT or that RAFT features will be updated to provide CPU-GPU
-compatible versions of the same.
-
 ### Importing a model
 nvForest uses Treelite as a common translation layer for all its input types.
 To load a forest model, we first create a Treelite model handle as
@@ -50,7 +43,7 @@ auto nvforest_model = import_from_treelite_model(
   tree_layout::depth_first, // layout
   128u,  // align_bytes
   false,  // use_double_precision
-  raft_proto::device_type::gpu,  // mem_type
+  nvforest::device_type::gpu,  // mem_type
   0,  // device_id
   stream  // CUDA stream
 );
@@ -74,17 +67,17 @@ serialization format will be used. Otherwise, the model will be evaluated
 at double precision if this value is set to `true` or single precision if this
 value is set to `false`.
 
-**dev_type**: This argument controls where the model will be executed. If `raft_proto::device_type::gpu`, then it will be executed on GPU. If `raft_proto::device_type::cpu`, then it will be executed on CPU.
+**dev_type**: This argument controls where the model will be executed. If `nvforest::device_type::gpu`, then it will be executed on GPU. If `nvforest::device_type::cpu`, then it will be executed on CPU.
 
 **device_id**: This integer indicates the ID of the GPU which should be used.
 If CPU is being used, this argument is ignored.
 
 **stream**: The CUDA stream which will be used for the actual model import.
 If CPU is being used, this argument is ignored. Note that you do *not* need
 CUDA headers if you are working with a CPU-only build of nvForest. This
-argument uses a `raft_proto::cuda_stream` type which evaluates to a
+argument uses a `nvforest::cuda_stream` type which evaluates to a
 placeholder type in CPU-only builds. For applications which themselves want to
-implement CPU-GPU interoperable builds, the `raft_proto::cuda_stream` type can be
+implement CPU-GPU interoperable builds, the `nvforest::cuda_stream` type can be
 used directly.
 
 
@@ -106,24 +99,24 @@ cudaMalloc((void**)&output, num_rows * num_outputs * sizeof(float));
 
 // Assuming that input is a float* pointing to data already located on-device
 
-auto handle = raft_proto::handle_t{};
+auto handle = nvforest::handle_t{};
 
 nvforest_model.predict(
   handle,
   output,
   input,
   num_rows,
-  raft_proto::device_type::gpu,  // out_mem_type
-  raft_proto::device_type::gpu,  // in_mem_type
+  nvforest::device_type::gpu,  // out_mem_type
+  nvforest::device_type::gpu,  // in_mem_type
   4  // chunk_size
 );
 ```
 
 **handle**: To provide a unified interface on CPU and GPU, we introduce
-`raft_proto::handle_t` as a wrapper for `raft::handle_t`. This is currently just a
+`nvforest::handle_t` as a wrapper for `raft::handle_t`. This is currently just a
 placeholder in CPU-only builds, and using it does not require any CUDA
 functionality. For GPU-enabled builds, you can construct a
-`raft_proto_handle_t` directly from the `raft::handle_t` you wish to use.
+`nvforest::handle_t` directly from the `raft::handle_t` you wish to use.
 
 **output**: Pointer to pre-allocated buffer where results should be
 written. If the model has been loaded at single precision, this should be a