Merge `raft_proto` into `nvforest` namespace by hcho3 · Pull Request #132 · rapidsai/nvforest

hcho3 · 2026-05-25T03:30:21Z

Closes #124
Closes #129

coderabbitai · 2026-05-25T03:43:27Z

📝 Walkthrough

Summary by CodeRabbit

Refactor
- Updated library name from libnvforest++.so to libnvforest.so and CMake target from nvforest++ to nvforest.
- Migrated C++ public APIs from raft_proto namespace to nvforest namespace, affecting device types, CUDA streams, buffers, and handles.
- Reorganized C++ header paths to use nvforest/ and nvforest/detail/ instead of nvforest/detail/raft_proto/.
- Updated Python bindings and documentation to reflect new C++ API structure.

Walkthrough

Global namespace migration from raft_proto to nvforest across C++/CUDA, Python/Cython, and tests. Introduces nvforest::device_type and moves stream, buffers, copy, inference, model, importer, and device init. Separately renames CMake target and shared library from nvforest++ to nvforest, updating build/packaging and docs.

Changes

API and namespace migration to nvforest

Layer / File(s)	Summary
Core primitives and support moved to nvforest `cpp/include/nvforest/device_type.hpp`, `cpp/include/nvforest/cuda_stream.hpp`, `cpp/include/nvforest/detail/{exceptions.hpp,gpu_support.hpp,const_agnostic.hpp,cuda_check.hpp,host_only_throw.hpp,ceildiv.hpp,bitset.hpp,forest.hpp,node.hpp,evaluate_tree.hpp,postprocessor.hpp,padding.hpp,gpu_introspection.hpp}`, `cpp/include/nvforest/detail/raft_proto/*` (removed), `cpp/include/nvforest/detail/specializations/forest_macros.hpp`	Defines nvforest::device_type and migrates streams, exceptions, GPU utilities, math helpers, and removes raft_proto counterparts.
Device id, setter, and initialization `cpp/include/nvforest/detail/device_id.hpp`, `cpp/include/nvforest/detail/device_setter.hpp`, `cpp/include/nvforest/detail/device_initialization*.{hpp,cuh}`, `cpp/include/nvforest/detail/specializations/device_initialization_macros.hpp`	Adds nvforest device_id variant/setter and updates CPU/GPU initialization and macros.
Buffers and copy utilities under nvforest `cpp/include/nvforest/detail/{non_owning_buffer.hpp,owning_buffer.hpp}`, `cpp/include/nvforest/buffer.hpp`, `cpp/include/nvforest/detail/copy*.hpp`	Moves buffer and copy APIs to nvforest with updated construction/move/copy semantics.
Inference APIs and kernels migrated `cpp/include/nvforest/detail/infer.{hpp,cuh}`, `cpp/include/nvforest/detail/infer_kernel/`, `cpp/include/nvforest/decision_forest.hpp`, `cpp/include/nvforest/detail/decision_forest_builder.hpp`, `cpp/src/infer*.{cpp,cu}`	Refactors inference dispatch, kernels, and decision_forest/builder to nvforest and updates explicit instantiations.
Forest model, handle, treelite importer, docs/macros `cpp/include/nvforest/{forest_model.hpp,handle.hpp,treelite_importer.hpp}`, `cpp/include/nvforest/detail/specializations/infer_macros.hpp`, `cpp/include/nvforest/{README.md,Implementation.md}`	Updates public model/handle/importer APIs to nvforest and refreshes docs and macros.
Python/Cython bindings migrated to nvforest `python/nvforest/nvforest/detail/{cuda_stream.pxd,device_type.pxd,handle.pxd,forest_inference.pyx}`, `python/nvforest/nvforest/detail/raft_proto/*.pxd` (removed)	Switches Cython externs and ForestInference to nvforest types and removes raft_proto bindings.
Tests aligned with nvforest types `cpp/tests/CMakeLists.txt`, `cpp/tests/buffer/*`, `cpp/tests/treelite_importer.cpp`	Updates test paths, namespaces, exceptions, and device types for nvforest.

Build, packaging, and soname rename

Layer / File(s)	Summary
CMake target/library, scripts, packaging, docs linking `cpp/CMakeLists.txt`, `build.sh`, `ci/build_wheel_nvforest.sh`, `conda/recipes/libnvforest/recipe.yaml`, `python/libnvforest/CMakeLists.txt`, `python/nvforest/CMakeLists.txt`, `python/libnvforest/libnvforest/load.py`, `docs/source/getting_started.rst`	Renames target to nvforest, updates soname to libnvforest.so, adjusts wheel/conda handling, Python loader, and docs link lines.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested labels

improvement, Cython / Python, CMake, CUDA/C++

Suggested reviewers

csadorf
gforsyth

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)

cpp/include/nvforest/cuda_stream.hpp (1)

16-20: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate CUDA synchronization failures from nvforest::synchronize

cpp/include/nvforest/cuda_stream.hpp calls cudaStreamSynchronize(stream) and drops the returned cudaError_t, making CUDA failures silent at the API boundary (Line 19).

Suggested fix

 `#ifdef` NVFOREST_ENABLE_GPU
 `#include` <cuda_runtime_api.h>
+#include <nvforest/detail/cuda_check.hpp>
 `#endif`
@@
 `#ifdef` NVFOREST_ENABLE_GPU
-  cudaStreamSynchronize(stream);
+  detail::cuda_check<device_type::gpu>(cudaStreamSynchronize(stream));
 `#endif`
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/nvforest/cuda_stream.hpp` around lines 16 - 20, The synchronize
function currently ignores the cudaStreamSynchronize return value, so CUDA
errors are dropped; update inline void synchronize(cuda_stream stream) to
capture the cudaError_t result from cudaStreamSynchronize, check it, and
propagate failures (either change the signature to return cudaError_t or throw a
std::runtime_error on non-success). Concretely, in synchronize call cudaError_t
err = cudaStreamSynchronize(stream); if (err != cudaSuccess) throw
std::runtime_error(std::string("cudaStreamSynchronize failed: ") +
cudaGetErrorString(err)); — reference symbols: synchronize,
cudaStreamSynchronize, cudaError_t, cudaGetErrorString.

cpp/include/nvforest/detail/exceptions.hpp (1)

8-56: ⚠️ Potential issue | 🟠 Major

Restore public exception names (nvforest::bad_cuda_call, etc.) or provide aliases in nvforest/exceptions.hpp.

cpp/include/nvforest/exceptions.hpp defines only nvforest::{model_import_error,type_error,traversal_exception,runtime_error} and does not re-export/alias nvforest::detail::{bad_cuda_call,out_of_bounds,wrong_device_type,mem_type_mismatch,wrong_device}. This makes the namespace move a downstream-breaking exception API change rather than a merge.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/nvforest/detail/exceptions.hpp` around lines 8 - 56, The public
API lost symbols when exception types were moved into nvforest::detail; restore
the original nvforest::* names by adding aliases in the public header
(nvforest/exceptions.hpp): create using declarations that map
nvforest::bad_cuda_call, out_of_bounds, wrong_device_type, mem_type_mismatch,
and wrong_device to their nvforest::detail:: counterparts (e.g., using
bad_cuda_call = detail::bad_cuda_call;), ensure the header includes or
forward-declares nvforest::detail types so the aliases compile, and keep the
original exception type names visible in the nvforest namespace to avoid
breaking downstream code.

cpp/include/nvforest/detail/infer/gpu.hpp (1)

13-24: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add missing <type_traits> include for std::enable_if_t

cpp/include/nvforest/detail/infer/gpu.hpp uses std::enable_if_t in the infer template declaration but doesn’t include <type_traits>, relying on transitive includes.

Proposed fix

 `#include` <cstddef>
 `#include` <optional>
+#include <type_traits>

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/nvforest/detail/infer/gpu.hpp` around lines 13 - 24, The header
declares the infer template using std::enable_if_t (template signature with
device_type D and infer(...)) but omits the required <type_traits> include; add
`#include` <type_traits> near the other includes so std::enable_if_t is defined
(ensuring the declaration of template <device_type D, bool
has_categorical_nodes, typename forest_t, ...> std::enable_if_t<D ==
device_type::gpu, void> infer(...) compiles without relying on transitive
includes).

cpp/include/nvforest/handle.hpp (1)

16-29: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject null GPU raft_handle_ in nvforest::handle_t

In NVFOREST_ENABLE_GPU builds, nvforest::handle_t’s constructor defaults handle_ptr to nullptr, but get_next_usable_stream(), get_stream_pool_size(), and synchronize() dereference raft_handle_ unconditionally. forest_model::predict() calls handle.get_next_usable_stream() / handle.get_usable_stream_count() without guarding, so nvforest::handle_t{} will crash on the first inference. Tests use a real raft::handle_t, but the README shows auto handle = nvforest::handle_t{};, which is unsafe for GPU builds.
Proposed fix
+#include <stdexcept>
+
 struct handle_t {
-  handle_t(raft::handle_t const* handle_ptr = nullptr) : raft_handle_{handle_ptr} {}
+  handle_t() = delete;
+  explicit handle_t(raft::handle_t const* handle_ptr) : raft_handle_{handle_ptr}
+  {
+    if (handle_ptr == nullptr) { throw std::invalid_argument("handle_ptr must not be null"); }
+  }
   handle_t(raft::handle_t const& raft_handle) : raft_handle_{&raft_handle} {}
Also update the public docs/README to reflect GPU builds must construct nvforest::handle_t from a valid raft::handle_t.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/nvforest/handle.hpp` around lines 16 - 29, The constructor
handle_t(raft::handle_t const* handle_ptr = nullptr) currently allows a null
raft_handle_ but methods get_next_usable_stream, get_stream_pool_size,
get_usable_stream_count and synchronize dereference it unconditionally; change
the ctor to reject null (e.g., remove the default nullptr and/or throw
std::invalid_argument or assert when handle_ptr==nullptr) so a
nvforest::handle_t must be constructed from a valid raft::handle_t, and update
the public README to state GPU builds require constructing nvforest::handle_t
from an existing raft::handle_t (also audit usages of nvforest::handle_t{} in
tests/docs and replace with an explicit valid raft::handle_t where needed).

cpp/include/nvforest/README.md (1)

46-70: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Unify parameter naming for the device-type argument.

Line 46 labels the argument as mem_type, but Line 70 documents it as dev_type. Please use one name consistently in both the example and the argument docs.

Suggested doc tweak

-**dev_type**: This argument controls where the model will be executed. If `nvforest::device_type::gpu`, then it will be executed on GPU. If `nvforest::device_type::cpu`, then it will be executed on CPU.
+**mem_type**: This argument controls where the model will be executed. If `nvforest::device_type::gpu`, then it will be executed on GPU. If `nvforest::device_type::cpu`, then it will be executed on CPU.

As per coding guidelines, "Consistency: Version numbers, parameter types, and terminology match code".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/nvforest/README.md` around lines 46 - 70, The docs use two
different names for the device argument—`mem_type` in the example call comment
and `dev_type` in the parameter description—so make them consistent: pick one
name (prefer `dev_type`) and update the example comment `// mem_type` to `//
dev_type` and any occurrences in the descriptive section (`mem_type` ->
`dev_type`), ensuring references like `nvforest::device_type::gpu` remain
unchanged and the parameter description that begins "dev_type: This argument
controls where the model will be executed..." matches the example and wording
elsewhere.

cpp/include/nvforest/treelite_importer.hpp (1)

515-529: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard against a null TreeliteModelHandle.

This wrapper dereferences tl_handle unconditionally. If a caller passes nullptr, the public API segfaults instead of raising a model import error.

Suggested fix

 inline auto import_from_treelite_handle(TreeliteModelHandle tl_handle,
                                         tree_layout layout     = preferred_tree_layout,
                                         index_type align_bytes = index_type{},
                                         std::optional<bool> use_double_precision = std::nullopt,
                                         nvforest::device_type dev_type = nvforest::device_type::cpu,
                                         int device                     = 0,
                                         nvforest::cuda_stream stream   = nvforest::cuda_stream{})
 {
+  if (tl_handle == nullptr) {
+    throw model_import_error{"Treelite model handle must not be null"};
+  }
   return import_from_treelite_model(*static_cast<treelite::Model*>(tl_handle),
                                     layout,
                                     align_bytes,
                                     use_double_precision,
                                     dev_type,
                                     device,
                                     stream);
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/nvforest/treelite_importer.hpp` around lines 515 - 529, The
wrapper import_from_treelite_handle dereferences tl_handle without checking for
nullptr; add a null check at the start of import_from_treelite_handle and if
tl_handle is null throw a suitable exception (e.g. std::invalid_argument or a
library-specific import error) with a clear message like "null
TreeliteModelHandle"; only call
import_from_treelite_model(*static_cast<treelite::Model*>(tl_handle), ...) when
tl_handle is non-null to avoid segfaults.

cpp/include/nvforest/buffer.hpp (1)

329-339: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

The checked copy guard underflows when an offset is already out of range.

These comparisons subtract size_t values before proving that src_offset <= src.size() and dst_offset <= dst.size(). If either offset is already too large, the subtraction wraps and the check passes, turning the “checked” overloads into out-of-bounds copies.

Suggested fix

   if constexpr (bounds_check) {
-    if (src.size() - src_offset < size || dst.size() - dst_offset < size) {
+    if (src_offset > src.size() || dst_offset > dst.size() ||
+        src.size() - src_offset < size || dst.size() - dst_offset < size) {
       throw detail::out_of_bounds("Attempted copy to or from buffer of inadequate size");
     }
   }

Apply the same guard to the rvalue-overload block below as well.

Also applies to: 361-372

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/nvforest/buffer.hpp` around lines 329 - 339, The bounds check in
the checked copy path incorrectly subtracts size_t values (src.size() -
src_offset and dst.size() - dst_offset) before verifying the offsets are in
range, which underflows when src_offset or dst_offset > src.size()/dst.size();
update the guard in the copy(...) path to first assert src_offset <= src.size()
and dst_offset <= dst.size() (or check offsets against sizes) and only then
compare remaining sizes against size, and apply the identical fix to the
rvalue-overload block (the subsequent copy(...) overload around the rvalue
overload lines) so both checked overloads prevent underflow.

cpp/include/nvforest/detail/infer/gpu.cuh (1)

153-163: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle the zero-shared-memory path before the occupancy math.

shared_mem_per_block can be 0 here when output falls back to global workspace and rows are not staged in shared memory. In that case ceildiv(max_shared_mem_per_sm, shared_mem_per_block) divides by zero, and the later "divide shared mem evenly" step has the same problem.

Suggested fix

-  auto resident_blocks_per_sm =
-    min(ceildiv(max_shared_mem_per_sm, shared_mem_per_block), max_resident_blocks);
+  auto resident_blocks_per_sm =
+    (shared_mem_per_block == 0)
+      ? max_resident_blocks
+      : min(ceildiv(max_shared_mem_per_sm, shared_mem_per_block), max_resident_blocks);
@@
-  shared_mem_per_block = std::min(
-    max_overall_shared_mem, max_shared_mem_per_sm / (max_shared_mem_per_sm / shared_mem_per_block));
+  if (shared_mem_per_block != 0) {
+    shared_mem_per_block = std::min(
+      max_overall_shared_mem,
+      max_shared_mem_per_sm / (max_shared_mem_per_sm / shared_mem_per_block));
+  }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/nvforest/detail/infer/gpu.cuh` around lines 153 - 163,
shared_mem_per_block can be zero (when output_workspace_size_bytes and row
staging were cleared), so the occupancy math calling
ceildiv(max_shared_mem_per_sm, shared_mem_per_block) and the later "divide
shared mem evenly" step will divide by zero; add an early guard after computing
shared_mem_per_block to detect shared_mem_per_block == 0 and handle that path
explicitly (for example set resident_blocks_per_sm = max_resident_blocks or
another safe default and skip any further divisions that distribute shared
memory per block), and ensure any subsequent logic that divides by
shared_mem_per_block or expects nonzero shared bytes checks this flag; update
references around shared_mem_per_block, resident_blocks_per_sm, ceildiv, and the
later "divide shared mem evenly" calculations to use this guarded path.

🧹 Nitpick comments (3)

cpp/include/nvforest/device_type.hpp (1)

7-7: ⚡ Quick win

Add Doxygen docs for the new public device_type enum.

Line 7 introduces a public API type without a doc comment. Please add a brief Doxygen block describing valid values and intended usage.
Suggested update
 namespace nvforest {
+/**
+ * `@brief` Execution target for inference routines.
+ */
 enum class device_type { cpu, gpu };
 }  // namespace nvforest
As per coding guidelines, "For public header files (C++ API): Check if new public functions/classes have documentation comments (Doxygen format)".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/nvforest/device_type.hpp` at line 7, Add a Doxygen comment block
immediately above the public enum class device_type describing that it
represents the execution device for NVForest operations, listing the valid enum
values (cpu and gpu) with a short description for each, and note intended usage
(e.g., selecting target device for computation or resource allocation). Use
brief `@enum/`@brief style tags and mention any default behavior or constraints if
applicable so the public API is documented for users.

cpp/include/nvforest/detail/device_setter.hpp (1)

6-10: 💤 Low value

Consider reordering includes to place dependencies first.

The include order places device_type.hpp after device_setter/base.hpp and device_setter/gpu.hpp. If these headers depend on device_type, it would be clearer to include device_type.hpp first.

♻️ Suggested include order

 `#pragma` once
+#include <nvforest/device_type.hpp>
 `#include` <nvforest/detail/device_setter/base.hpp>
 `#ifdef` NVFOREST_ENABLE_GPU
 `#include` <nvforest/detail/device_setter/gpu.hpp>
 `#endif`
-#include <nvforest/device_type.hpp>

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/nvforest/detail/device_setter.hpp` around lines 6 - 10, Reorder
the includes in device_setter.hpp so that the dependency header device_type.hpp
is included before device_setter/base.hpp and device_setter/gpu.hpp;
specifically, move the include for nvforest/device_type.hpp above the includes
for nvforest/detail/device_setter/base.hpp and (conditionally)
nvforest/detail/device_setter/gpu.hpp to ensure device_type is available to the
base and gpu headers (symbols: device_setter.hpp, device_setter::base,
device_setter::gpu, and device_type.hpp).

cpp/include/nvforest/detail/specializations/device_initialization_macros.hpp (1)

11-13: ⚡ Quick win

Fully qualify the generated nvforest types.

This macro now depends on the expansion site already being inside namespace nvforest. Emitting ::nvforest::device_type and ::nvforest::device_id keeps it stable if an instantiation moves or lives at global scope.

Proposed fix

 `#define` NVFOREST_INITIALIZE_DEVICE(template_type, variant_index)                          \
-  template_type void initialize_device<NVFOREST_FOREST(variant_index), device_type::gpu>( \
-    device_id<device_type::gpu>);
+  template_type void                                                                     \
+  initialize_device<NVFOREST_FOREST(variant_index), ::nvforest::device_type::gpu>(      \
+    ::nvforest::device_id<::nvforest::device_type::gpu>);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/nvforest/detail/specializations/device_initialization_macros.hpp`
around lines 11 - 13, The macro NVFOREST_INITIALIZE_DEVICE emits types like
device_type and device_id without global qualification, which breaks if the
expansion isn't inside namespace nvforest; update the macro so generated symbols
in initialize_device<NVFOREST_FOREST(variant_index),
device_type::gpu>(device_id<device_type::gpu>) are fully qualified (e.g.,
::nvforest::device_type and ::nvforest::device_id and any other nvforest symbols
referenced) so the instantiation is stable regardless of the expansion scope.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/CMakeLists.txt`:
- Line 126: The exported CMake target name changed and you need to restore a
deprecated compatibility alias so downstream projects linking
nvforest::nvforest++ keep working: after NVFOREST_CPP_TARGET is set and after
the existing add_library(nvforest::${NVFOREST_CPP_TARGET} ALIAS
${NVFOREST_CPP_TARGET}) line, create a second alias target mapping
nvforest::nvforest++ to the same implementation target (i.e., add an alias
nvforest::nvforest++ that points to ${NVFOREST_CPP_TARGET}) and ensure the alias
is included in the rapids_export/GLOBAL_TARGETS (or the package config) so the
installed package exports the old name as well; keep a clear deprecation comment
and only add the alias (no API changes).

In `@cpp/include/nvforest/decision_forest.hpp`:
- Around line 460-463: The categorical index-size check in
double_indexes_required uses (detail::ceildiv(max_num_categories,
max_local_categories) + 1 * num_categorical_nodes) which computes bins_per_node
+ num_categorical_nodes instead of (bins_per_node + 1) * num_categorical_nodes;
update the expression in the double_indexes_required calculation to multiply the
entire (bins_per_node + 1) factor by num_categorical_nodes (i.e. use
(detail::ceildiv(max_num_categories, max_local_categories) + 1) *
num_categorical_nodes) so the comparison against
std::numeric_limits<small_index_t>::max() correctly reflects required index
range for categorical storage.
- Around line 149-163: The decision_forest constructor currently accepts rvalue
references (nodes, root_node_indexes, node_id_mapping, bias,
optional<vector_output>, optional<categorical_storage>) but then copies them
because the member initializer uses the parameter names as lvalues; change the
initializer to transfer ownership by moving these parameters into the members
(e.g., use std::move(nodes), std::move(root_node_indexes),
std::move(node_id_mapping), std::move(bias) and std::move on the optionals like
vector_output and categorical_storage) or switch the signature to take by value
and move from those local variables into members; update the member initializers
in the decision_forest constructor accordingly so buffers are moved, not
deep-copied.

In `@cpp/include/nvforest/detail/device_initialization/gpu.cuh`:
- Around line 40-201: The repeated cudaFuncSetAttribute calls for many
infer_kernel specializations are brittle; create a compile-time helper (e.g., a
templated helper function named set_infer_kernel_max_shared<...>) that accepts
the same template parameters as infer_kernel and calls
cudaFuncSetAttribute(infer_kernel<...>,
cudaFuncAttributeMaxDynamicSharedMemorySize, max_shared_mem_per_block); then
replace the long list with a small set of invocations of that helper (or a
constexpr list/pack of kernel-parameter tuples expanded via fold/pack expansion)
for the combinations used. Refer to infer_kernel, cudaFuncSetAttribute,
cudaFuncAttributeMaxDynamicSharedMemorySize, and max_shared_mem_per_block when
implementing the templated set_infer_kernel_max_shared helper and the pack/array
expansion so behavior remains identical.

In `@cpp/include/nvforest/detail/infer/gpu.cuh`:
- Around line 190-193: The code computes num_blocks from row_count and proceeds
to allocate global_workspace and launch kernels even when row_count == 0; add an
early return guard in the function (in gpu.cuh around the block using
num_blocks/row_output_size/global_workspace) that checks if row_count == 0 (or
num_blocks == 0) and returns immediately before any buffer<output_t> allocation
or kernel launch; apply the same guard to the later region noted (lines
~229-320) so no CUDA allocations or kernel launches occur with zero grid/block
dimensions.

In `@cpp/include/nvforest/detail/specializations/forest_macros.hpp`:
- Around line 17-18: The macro NVFOREST_SPEC expands to an unqualified
specialization_variant which causes scope-dependent failures; update the macro
definition so it references the fully-qualified type (e.g.,
::nvforest::detail::specialization_variant) instead of the bare name so
expansions work from any namespace or translation unit; locate NVFOREST_SPEC in
this header and replace the unqualified specialization_variant with the
fully-qualified symbol.

In `@python/nvforest/nvforest/detail/forest_inference.pyx`:
- Around line 105-106: Validate the incoming device string before mapping to
nvforest_device_t: only accept "gpu" or "cpu" and if it's anything else raise a
clear Python error (e.g., ValueError) that includes the actual value and the
expected options; then set dev_type = nvforest_device_t.gpu if device == "gpu"
else nvforest_device_t.cpu. Update the mapping around the dev_type variable in
forest_inference.pyx (where device and nvforest_device_t are used) so invalid
inputs fail early instead of silently falling back to CPU and causing assertions
later in predict().

---

Outside diff comments:
In `@cpp/include/nvforest/buffer.hpp`:
- Around line 329-339: The bounds check in the checked copy path incorrectly
subtracts size_t values (src.size() - src_offset and dst.size() - dst_offset)
before verifying the offsets are in range, which underflows when src_offset or
dst_offset > src.size()/dst.size(); update the guard in the copy(...) path to
first assert src_offset <= src.size() and dst_offset <= dst.size() (or check
offsets against sizes) and only then compare remaining sizes against size, and
apply the identical fix to the rvalue-overload block (the subsequent copy(...)
overload around the rvalue overload lines) so both checked overloads prevent
underflow.

In `@cpp/include/nvforest/cuda_stream.hpp`:
- Around line 16-20: The synchronize function currently ignores the
cudaStreamSynchronize return value, so CUDA errors are dropped; update inline
void synchronize(cuda_stream stream) to capture the cudaError_t result from
cudaStreamSynchronize, check it, and propagate failures (either change the
signature to return cudaError_t or throw a std::runtime_error on non-success).
Concretely, in synchronize call cudaError_t err = cudaStreamSynchronize(stream);
if (err != cudaSuccess) throw
std::runtime_error(std::string("cudaStreamSynchronize failed: ") +
cudaGetErrorString(err)); — reference symbols: synchronize,
cudaStreamSynchronize, cudaError_t, cudaGetErrorString.

In `@cpp/include/nvforest/detail/exceptions.hpp`:
- Around line 8-56: The public API lost symbols when exception types were moved
into nvforest::detail; restore the original nvforest::* names by adding aliases
in the public header (nvforest/exceptions.hpp): create using declarations that
map nvforest::bad_cuda_call, out_of_bounds, wrong_device_type,
mem_type_mismatch, and wrong_device to their nvforest::detail:: counterparts
(e.g., using bad_cuda_call = detail::bad_cuda_call;), ensure the header includes
or forward-declares nvforest::detail types so the aliases compile, and keep the
original exception type names visible in the nvforest namespace to avoid
breaking downstream code.

In `@cpp/include/nvforest/detail/infer/gpu.cuh`:
- Around line 153-163: shared_mem_per_block can be zero (when
output_workspace_size_bytes and row staging were cleared), so the occupancy math
calling ceildiv(max_shared_mem_per_sm, shared_mem_per_block) and the later
"divide shared mem evenly" step will divide by zero; add an early guard after
computing shared_mem_per_block to detect shared_mem_per_block == 0 and handle
that path explicitly (for example set resident_blocks_per_sm =
max_resident_blocks or another safe default and skip any further divisions that
distribute shared memory per block), and ensure any subsequent logic that
divides by shared_mem_per_block or expects nonzero shared bytes checks this
flag; update references around shared_mem_per_block, resident_blocks_per_sm,
ceildiv, and the later "divide shared mem evenly" calculations to use this
guarded path.

In `@cpp/include/nvforest/detail/infer/gpu.hpp`:
- Around line 13-24: The header declares the infer template using
std::enable_if_t (template signature with device_type D and infer(...)) but
omits the required <type_traits> include; add `#include` <type_traits> near the
other includes so std::enable_if_t is defined (ensuring the declaration of
template <device_type D, bool has_categorical_nodes, typename forest_t, ...>
std::enable_if_t<D == device_type::gpu, void> infer(...) compiles without
relying on transitive includes).

In `@cpp/include/nvforest/handle.hpp`:
- Around line 16-29: The constructor handle_t(raft::handle_t const* handle_ptr =
nullptr) currently allows a null raft_handle_ but methods
get_next_usable_stream, get_stream_pool_size, get_usable_stream_count and
synchronize dereference it unconditionally; change the ctor to reject null
(e.g., remove the default nullptr and/or throw std::invalid_argument or assert
when handle_ptr==nullptr) so a nvforest::handle_t must be constructed from a
valid raft::handle_t, and update the public README to state GPU builds require
constructing nvforest::handle_t from an existing raft::handle_t (also audit
usages of nvforest::handle_t{} in tests/docs and replace with an explicit valid
raft::handle_t where needed).

In `@cpp/include/nvforest/README.md`:
- Around line 46-70: The docs use two different names for the device
argument—`mem_type` in the example call comment and `dev_type` in the parameter
description—so make them consistent: pick one name (prefer `dev_type`) and
update the example comment `// mem_type` to `// dev_type` and any occurrences in
the descriptive section (`mem_type` -> `dev_type`), ensuring references like
`nvforest::device_type::gpu` remain unchanged and the parameter description that
begins "dev_type: This argument controls where the model will be executed..."
matches the example and wording elsewhere.

In `@cpp/include/nvforest/treelite_importer.hpp`:
- Around line 515-529: The wrapper import_from_treelite_handle dereferences
tl_handle without checking for nullptr; add a null check at the start of
import_from_treelite_handle and if tl_handle is null throw a suitable exception
(e.g. std::invalid_argument or a library-specific import error) with a clear
message like "null TreeliteModelHandle"; only call
import_from_treelite_model(*static_cast<treelite::Model*>(tl_handle), ...) when
tl_handle is non-null to avoid segfaults.

---

Nitpick comments:
In `@cpp/include/nvforest/detail/device_setter.hpp`:
- Around line 6-10: Reorder the includes in device_setter.hpp so that the
dependency header device_type.hpp is included before device_setter/base.hpp and
device_setter/gpu.hpp; specifically, move the include for
nvforest/device_type.hpp above the includes for
nvforest/detail/device_setter/base.hpp and (conditionally)
nvforest/detail/device_setter/gpu.hpp to ensure device_type is available to the
base and gpu headers (symbols: device_setter.hpp, device_setter::base,
device_setter::gpu, and device_type.hpp).

In
`@cpp/include/nvforest/detail/specializations/device_initialization_macros.hpp`:
- Around line 11-13: The macro NVFOREST_INITIALIZE_DEVICE emits types like
device_type and device_id without global qualification, which breaks if the
expansion isn't inside namespace nvforest; update the macro so generated symbols
in initialize_device<NVFOREST_FOREST(variant_index),
device_type::gpu>(device_id<device_type::gpu>) are fully qualified (e.g.,
::nvforest::device_type and ::nvforest::device_id and any other nvforest symbols
referenced) so the instantiation is stable regardless of the expansion scope.

In `@cpp/include/nvforest/device_type.hpp`:
- Line 7: Add a Doxygen comment block immediately above the public enum class
device_type describing that it represents the execution device for NVForest
operations, listing the valid enum values (cpu and gpu) with a short description
for each, and note intended usage (e.g., selecting target device for computation
or resource allocation). Use brief `@enum/`@brief style tags and mention any
default behavior or constraints if applicable so the public API is documented
for users.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5877b175-08a8-4c41-81f5-f28cab06a27f

📥 Commits

Reviewing files that changed from the base of the PR and between a9defae and 10b6f20.

📒 Files selected for processing (107)

build.sh
ci/build_wheel_nvforest.sh
conda/recipes/libnvforest/recipe.yaml
cpp/CMakeLists.txt
cpp/include/nvforest/Implementation.md
cpp/include/nvforest/README.md
cpp/include/nvforest/buffer.hpp
cpp/include/nvforest/cuda_stream.hpp
cpp/include/nvforest/decision_forest.hpp
cpp/include/nvforest/detail/bitset.hpp
cpp/include/nvforest/detail/ceildiv.hpp
cpp/include/nvforest/detail/const_agnostic.hpp
cpp/include/nvforest/detail/copy.hpp
cpp/include/nvforest/detail/copy/cpu.hpp
cpp/include/nvforest/detail/copy/gpu.hpp
cpp/include/nvforest/detail/cuda_check.hpp
cpp/include/nvforest/detail/cuda_check/base.hpp
cpp/include/nvforest/detail/cuda_check/gpu.hpp
cpp/include/nvforest/detail/decision_forest_builder.hpp
cpp/include/nvforest/detail/device_id.hpp
cpp/include/nvforest/detail/device_id/base.hpp
cpp/include/nvforest/detail/device_id/cpu.hpp
cpp/include/nvforest/detail/device_id/gpu.hpp
cpp/include/nvforest/detail/device_initialization.hpp
cpp/include/nvforest/detail/device_initialization/cpu.hpp
cpp/include/nvforest/detail/device_initialization/gpu.cuh
cpp/include/nvforest/detail/device_initialization/gpu.hpp
cpp/include/nvforest/detail/device_setter.hpp
cpp/include/nvforest/detail/device_setter/base.hpp
cpp/include/nvforest/detail/device_setter/gpu.hpp
cpp/include/nvforest/detail/evaluate_tree.hpp
cpp/include/nvforest/detail/exceptions.hpp
cpp/include/nvforest/detail/forest.hpp
cpp/include/nvforest/detail/gpu_introspection.hpp
cpp/include/nvforest/detail/gpu_support.hpp
cpp/include/nvforest/detail/host_only_throw.hpp
cpp/include/nvforest/detail/host_only_throw/base.hpp
cpp/include/nvforest/detail/host_only_throw/cpu.hpp
cpp/include/nvforest/detail/infer.hpp
cpp/include/nvforest/detail/infer/cpu.hpp
cpp/include/nvforest/detail/infer/gpu.cuh
cpp/include/nvforest/detail/infer/gpu.hpp
cpp/include/nvforest/detail/infer_kernel/cpu.hpp
cpp/include/nvforest/detail/infer_kernel/gpu.cuh
cpp/include/nvforest/detail/node.hpp
cpp/include/nvforest/detail/non_owning_buffer.hpp
cpp/include/nvforest/detail/non_owning_buffer/base.hpp
cpp/include/nvforest/detail/owning_buffer.hpp
cpp/include/nvforest/detail/owning_buffer/base.hpp
cpp/include/nvforest/detail/owning_buffer/cpu.hpp
cpp/include/nvforest/detail/owning_buffer/gpu.hpp
cpp/include/nvforest/detail/padding.hpp
cpp/include/nvforest/detail/postprocessor.hpp
cpp/include/nvforest/detail/raft_proto/cuda_check.hpp
cpp/include/nvforest/detail/raft_proto/detail/const_agnostic.hpp
cpp/include/nvforest/detail/raft_proto/detail/host_only_throw.hpp
cpp/include/nvforest/detail/raft_proto/detail/non_owning_buffer.hpp
cpp/include/nvforest/detail/raft_proto/detail/owning_buffer.hpp
cpp/include/nvforest/detail/raft_proto/device_id.hpp
cpp/include/nvforest/detail/raft_proto/device_setter.hpp
cpp/include/nvforest/detail/raft_proto/device_type.hpp
cpp/include/nvforest/detail/specializations/device_initialization_macros.hpp
cpp/include/nvforest/detail/specializations/forest_macros.hpp
cpp/include/nvforest/detail/specializations/infer_macros.hpp
cpp/include/nvforest/device_type.hpp
cpp/include/nvforest/forest_model.hpp
cpp/include/nvforest/handle.hpp
cpp/include/nvforest/treelite_importer.hpp
cpp/src/infer0.cpp
cpp/src/infer0.cu
cpp/src/infer1.cpp
cpp/src/infer1.cu
cpp/src/infer10.cpp
cpp/src/infer10.cu
cpp/src/infer11.cpp
cpp/src/infer11.cu
cpp/src/infer2.cpp
cpp/src/infer2.cu
cpp/src/infer3.cpp
cpp/src/infer3.cu
cpp/src/infer4.cpp
cpp/src/infer4.cu
cpp/src/infer5.cpp
cpp/src/infer5.cu
cpp/src/infer6.cpp
cpp/src/infer6.cu
cpp/src/infer7.cpp
cpp/src/infer7.cu
cpp/src/infer8.cpp
cpp/src/infer8.cu
cpp/src/infer9.cpp
cpp/src/infer9.cu
cpp/tests/CMakeLists.txt
cpp/tests/buffer/buffer.cpp
cpp/tests/buffer/buffer.cu
cpp/tests/treelite_importer.cpp
docs/source/getting_started.rst
python/libnvforest/CMakeLists.txt
python/libnvforest/libnvforest/load.py
python/nvforest/CMakeLists.txt
python/nvforest/nvforest/detail/cuda_stream.pxd
python/nvforest/nvforest/detail/device_type.pxd
python/nvforest/nvforest/detail/forest_inference.pyx
python/nvforest/nvforest/detail/handle.pxd
python/nvforest/nvforest/detail/raft_proto/__init__.py
python/nvforest/nvforest/detail/raft_proto/device_type.pxd
python/nvforest/nvforest/detail/raft_proto/optional.pxd

💤 Files with no reviewable changes (11)

cpp/include/nvforest/detail/raft_proto/cuda_check.hpp
cpp/include/nvforest/detail/raft_proto/device_setter.hpp
cpp/include/nvforest/detail/raft_proto/detail/host_only_throw.hpp
cpp/include/nvforest/detail/raft_proto/detail/const_agnostic.hpp
python/nvforest/nvforest/detail/raft_proto/device_type.pxd
cpp/include/nvforest/detail/raft_proto/device_id.hpp
python/nvforest/nvforest/detail/raft_proto/init.py
cpp/include/nvforest/detail/raft_proto/detail/owning_buffer.hpp
cpp/include/nvforest/detail/raft_proto/detail/non_owning_buffer.hpp
python/nvforest/nvforest/detail/raft_proto/optional.pxd
cpp/include/nvforest/detail/raft_proto/device_type.hpp

coderabbitai · 2026-05-25T03:43:30Z

+  decision_forest(
+    buffer<node_type>&& nodes,
+    buffer<index_type>&& root_node_indexes,
+    buffer<index_type>&& node_id_mapping,
+    buffer<io_type>&& bias,
+    index_type num_features,
+    index_type num_outputs                                                      = index_type{2},
+    bool has_categorical_nodes                                                  = false,
+    std::optional<buffer<io_type>>&& vector_output                              = std::nullopt,
+    std::optional<buffer<typename node_type::index_type>>&& categorical_storage = std::nullopt,
+    index_type leaf_size                                                        = index_type{1},
+    row_op row_postproc                                                         = row_op::disable,
+    element_op elem_postproc  = element_op::disable,
+    io_type average_factor    = io_type{1},
+    io_type postproc_constant = io_type{1})


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

These && buffers are still copied, not moved.

The constructor advertises transfer of ownership, but the member initializers below use the named parameters as lvalues. That deep-copies every model buffer instead of moving it, which can double import-time allocation/copy cost and OOM large GPU models.

Suggested fix

decision_forest( buffer<node_type>&& nodes, buffer<index_type>&& root_node_indexes, buffer<index_type>&& node_id_mapping, buffer<io_type>&& bias, @@ index_type leaf_size = index_type{1}, row_op row_postproc = row_op::disable, element_op elem_postproc = element_op::disable, io_type average_factor = io_type{1}, io_type postproc_constant = io_type{1}) - : nodes_{nodes}, - root_node_indexes_{root_node_indexes}, - node_id_mapping_{node_id_mapping}, - bias_{bias}, - vector_output_{vector_output}, - categorical_storage_{categorical_storage}, + : nodes_{std::move(nodes)}, + root_node_indexes_{std::move(root_node_indexes)}, + node_id_mapping_{std::move(node_id_mapping)}, + bias_{std::move(bias)}, + vector_output_{std::move(vector_output)}, + categorical_storage_{std::move(categorical_storage)}, num_features_{num_features}, num_outputs_{num_outputs}, leaf_size_{leaf_size},

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/include/nvforest/decision_forest.hpp` around lines 149 - 163, The decision_forest constructor currently accepts rvalue references (nodes, root_node_indexes, node_id_mapping, bias, optional<vector_output>, optional<categorical_storage>) but then copies them because the member initializer uses the parameter names as lvalues; change the initializer to transfer ownership by moving these parameters into the members (e.g., use std::move(nodes), std::move(root_node_indexes), std::move(node_id_mapping), std::move(bias) and std::move on the optionals like vector_output and categorical_storage) or switch the signature to take by value and move from those local variables into members; update the member initializers in the decision_forest constructor accordingly so buffers are moved, not deep-copied.

coderabbitai · 2026-05-25T03:43:30Z

  auto double_indexes_required =
    (max_num_categories > max_local_categories &&
-     ((raft_proto::ceildiv(max_num_categories, max_local_categories) + 1 * num_categorical_nodes) >
+     ((detail::ceildiv(max_num_categories, max_local_categories) + 1 * num_categorical_nodes) >
      std::numeric_limits<small_index_t>::max())) ||


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix the categorical-storage size formula.

detail::ceildiv(max_num_categories, max_local_categories) + 1 * num_categorical_nodes evaluates as bins_per_node + num_categorical_nodes, not (bins_per_node + 1) * num_categorical_nodes. That underestimates the required index range and can select the wrong forest specialization for large categorical models.

Suggested fix

- (max_num_categories > max_local_categories && - ((detail::ceildiv(max_num_categories, max_local_categories) + 1 * num_categorical_nodes) > + (max_num_categories > max_local_categories && + (((detail::ceildiv(max_num_categories, max_local_categories) + 1) * + num_categorical_nodes) > std::numeric_limits<small_index_t>::max())) ||

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

auto double_indexes_required =

(max_num_categories > max_local_categories &&

((raft_proto::ceildiv(max_num_categories, max_local_categories) + 1 * num_categorical_nodes) >

((detail::ceildiv(max_num_categories, max_local_categories) + 1 * num_categorical_nodes) >

std::numeric_limits<small_index_t>::max())) ||

auto double_indexes_required =

(max_num_categories > max_local_categories &&

(((detail::ceildiv(max_num_categories, max_local_categories) + 1) *

num_categorical_nodes) >

std::numeric_limits<small_index_t>::max())) ||

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/include/nvforest/decision_forest.hpp` around lines 460 - 463, The categorical index-size check in double_indexes_required uses (detail::ceildiv(max_num_categories, max_local_categories) + 1 * num_categorical_nodes) which computes bins_per_node + num_categorical_nodes instead of (bins_per_node + 1) * num_categorical_nodes; update the expression in the double_indexes_required calculation to multiply the entire (bins_per_node + 1) factor by num_categorical_nodes (i.e. use (detail::ceildiv(max_num_categories, max_local_categories) + 1) * num_categorical_nodes) so the comparison against std::numeric_limits<small_index_t>::max() correctly reflects required index range for categorical storage.

coderabbitai · 2026-05-25T03:43:30Z

+  auto num_blocks = std::min(ceildiv(row_count, rows_per_block_iteration), MAX_BLOCKS);
  if (row_output_size == 0) {
-    global_workspace = raft_proto::buffer<output_t>{
-      output_workspace_size * num_blocks, raft_proto::device_type::gpu, device.value(), stream};
+    global_workspace = buffer<output_t>{
+      output_workspace_size * num_blocks, device_type::gpu, device.value(), stream};


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return early for empty batches.

When row_count == 0, num_blocks becomes 0 and the kernel launches below become invalid CUDA launches. This path needs an early return before workspace allocation/launch selection.

Suggested fix

{ using output_t = typename forest_t::template raw_output_type<vector_output_t>; + + if (row_count == 0) { return; } auto sm_count = get_sm_count(device);

As per coding guidelines, Kernel launch with zero blocks/threads or invalid grid/block dimensions must be validated before kernel execution.

Also applies to: 229-320

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/include/nvforest/detail/infer/gpu.cuh` around lines 190 - 193, The code computes num_blocks from row_count and proceeds to allocate global_workspace and launch kernels even when row_count == 0; add an early return guard in the function (in gpu.cuh around the block using num_blocks/row_output_size/global_workspace) that checks if row_count == 0 (or num_blocks == 0) and returns immediately before any buffer<output_t> allocation or kernel launch; apply the same guard to the later region noted (lines ~229-320) so no CUDA allocations or kernel launches occur with zero grid/block dimensions.

coderabbitai · 2026-05-25T03:43:30Z

+        cdef nvforest_device_t dev_type
+        dev_type = nvforest_device_t.gpu if device == "gpu" else nvforest_device_t.cpu


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject unknown device values here.

Anything other than "gpu" currently falls back to CPU, but predict() later assumes the only non-CPU value is "gpu". A typo like "cuda" therefore imports the model on CPU and fails later with an unhelpful assertion instead of a clear validation error.

Suggested fix

- dev_type = nvforest_device_t.gpu if device == "gpu" else nvforest_device_t.cpu + if device == "gpu": + dev_type = nvforest_device_t.gpu + elif device == "cpu": + dev_type = nvforest_device_t.cpu + else: + raise ValueError(f"device must be 'cpu' or 'gpu', got {device!r}")

As per coding guidelines, Error messages should be helpful and include expected vs actual values.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cdef nvforest_device_t dev_type

dev_type = nvforest_device_t.gpu if device == "gpu" else nvforest_device_t.cpu

cdef nvforest_device_t dev_type

if device == "gpu":

dev_type = nvforest_device_t.gpu

elif device == "cpu":

dev_type = nvforest_device_t.cpu

else:

raise ValueError(f"device must be 'cpu' or 'gpu', got {device!r}")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/nvforest/nvforest/detail/forest_inference.pyx` around lines 105 - 106, Validate the incoming device string before mapping to nvforest_device_t: only accept "gpu" or "cpu" and if it's anything else raise a clear Python error (e.g., ValueError) that includes the actual value and the expected options; then set dev_type = nvforest_device_t.gpu if device == "gpu" else nvforest_device_t.cpu. Update the mapping around the dev_type variable in forest_inference.pyx (where device and nvforest_device_t are used) so invalid inputs fail early instead of silently falling back to CPU and causing assertions later in predict().

hcho3 · 2026-05-25T18:02:44Z

@coderabbitai Please create a new issue to remind us to address the unresolved comments here.

coderabbitai · 2026-05-25T18:07:59Z

Tip

For best results, initiate chat on the files or code changes.

@hcho3 I've created issue #133 to track all the unresolved review comments: #133

The issue captures the following items from the PR review:

Critical (1)

cpp/include/nvforest/buffer.hpp — size_t underflow in the bounds-checked copy overloads

Major (8)

cpp/include/nvforest/decision_forest.hpp — buffers deep-copied instead of moved in the constructor
cpp/include/nvforest/decision_forest.hpp — operator-precedence bug in the categorical storage size formula
cpp/include/nvforest/detail/infer/gpu.cuh — no early return when row_count == 0
cpp/include/nvforest/detail/infer/gpu.cuh — potential divide-by-zero when shared_mem_per_block == 0
cpp/include/nvforest/cuda_stream.hpp — cudaStreamSynchronize error silently dropped
cpp/include/nvforest/detail/exceptions.hpp — exception types in nvforest::detail not re-exported under nvforest::
cpp/include/nvforest/handle.hpp — null raft_handle_ causes crash in GPU builds
cpp/include/nvforest/treelite_importer.hpp — null TreeliteModelHandle not guarded
python/nvforest/nvforest/detail/forest_inference.pyx — unknown device strings silently fall back to CPU

Minor (2)

cpp/include/nvforest/detail/infer/gpu.hpp — missing <type_traits> include
cpp/include/nvforest/README.md — inconsistent parameter name (mem_type vs dev_type)

hcho3 added 6 commits May 22, 2026 16:19

Add missing include

fed8f19

Move C++ headers out of raft_proto/

511219e

Rename raft_proto namespace

523d8fd

Update the Python layer

4c341f5

Rename libnvforest++ -> libnvforest

4cb6fe5

Update docs

10b6f20

hcho3 requested review from a team as code owners May 25, 2026 03:30

hcho3 requested a review from KyleFromNVIDIA May 25, 2026 03:30

github-actions Bot added conda Relates to conda packaging ci Cython / Python CMake CUDA/C++ labels May 25, 2026

hcho3 added breaking Introduces a breaking change improvement Improves an existing functionality and removed conda Relates to conda packaging ci Cython / Python CMake CUDA/C++ labels May 25, 2026

coderabbitai Bot reviewed May 25, 2026

View reviewed changes

Respond to comment

72fee66

github-actions Bot added conda Relates to conda packaging ci Cython / Python CMake labels May 25, 2026

github-actions Bot added the CUDA/C++ label May 25, 2026

coderabbitai Bot mentioned this pull request May 25, 2026

Follow-up items from PR #132 (merge raft_proto into nvforest namespace) #133

Open

csadorf assigned hcho3 Jun 9, 2026

		cdef nvforest_device_t dev_type
		dev_type = nvforest_device_t.gpu if device == "gpu" else nvforest_device_t.cpu

Uh oh!

Conversation

hcho3 commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

hcho3 commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant