Skip to content

Commit a7d3550

Browse files
committed
Add support to ndarray for DLPack version 1
This commit adds support for the struct ``DLManagedTensorVersioned`` as defined by DLPack version 1. It also adds the ndarray framework ``nb::arrayapi``, which returns an object that provides the buffer interface and provides the two DLPack methods ``__dlpack__()`` and ``__dlpack_device__()``.
1 parent f2499d4 commit a7d3550

16 files changed

+1007
-380
lines changed

docs/api_extra.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1108,6 +1108,11 @@ convert into an equivalent representation in one of the following frameworks:
11081108

11091109
Builtin Python ``memoryview`` for CPU-resident data.
11101110

1111+
.. cpp:class:: arrayapi
1112+
1113+
An object that both implements the buffer protocol and also has the
1114+
``__dlpack__`` and ``_dlpack_device__`` attributes.
1115+
11111116
Eigen convenience type aliases
11121117
------------------------------
11131118

docs/changelog.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,14 @@ Version TBD (not yet released)
2222
Clang-based Intel compiler). Continuous integration tests have been added to
2323
ensure compatibility with these compilers on an ongoing basis.
2424

25+
- The framework ``nb::arrayapi`` is now available to return an nd-array from
26+
C++ to Python as an object that supports both the Python buffer protocol as
27+
well as the DLPack methods ``__dlpack__`` and ``_dlpack_device__``.
28+
Nanobind now supports importing and exporting nd-arrays via capsules that
29+
contain the ``DLManagedTensorVersioned`` struct, which has a flag bit
30+
indicating the nd-array is read-only.
31+
(PR `#1175 <https://github.com/wjakob/nanobind/pull/1175>`__).
32+
2533
Version 2.9.2 (Sep 4, 2025)
2634
---------------------------
2735

docs/ndarray.rst

Lines changed: 105 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -275,12 +275,19 @@ desired Python type.
275275
- :cpp:class:`nb::tensorflow <tensorflow>`: create a ``tensorflow.python.framework.ops.EagerTensor``.
276276
- :cpp:class:`nb::jax <jax>`: create a ``jaxlib.xla_extension.DeviceArray``.
277277
- :cpp:class:`nb::cupy <cupy>`: create a ``cupy.ndarray``.
278+
- :cpp:class:`nb::memview <memview>`: create a Python ``memoryview``.
279+
- :cpp:class:`nb::arrayapi <arrayapi>`: create an object that supports the
280+
Python buffer protocol (i.e., is accepted as an argument to ``memoryview()``)
281+
and also has the DLPack attributes ``__dlpack__`` and ``_dlpack_device__``
282+
(i.e., it is accepted as an argument to a framework's ``from_dlpack()``
283+
function).
278284
- No framework annotation. In this case, nanobind will create a raw Python
279285
``dltensor`` `capsule <https://docs.python.org/3/c-api/capsule.html>`__
280-
representing the `DLPack <https://github.com/dmlc/dlpack>`__ metadata.
286+
representing the `DLPack <https://github.com/dmlc/dlpack>`__ metadata of
287+
a ``DLManagedTensor``.
281288

282289
This annotation also affects the auto-generated docstring of the function,
283-
which in this case becomes:
290+
which in this example's case becomes:
284291

285292
.. code-block:: python
286293
@@ -458,6 +465,21 @@ interpreted as follows:
458465
- :cpp:enumerator:`rv_policy::move` is unsupported and demoted to
459466
:cpp:enumerator:`rv_policy::copy`.
460467

468+
Note that when a copy is returned, the copy is made by the framework, not by
469+
nanobind itself.
470+
For example, ``numpy.array()`` is passed the keyword argument ``copy`` with
471+
value ``True``, or the PyTorch tensor's ``clone()`` method is immediately
472+
called to create the copy.
473+
This design has a couple of advantages.
474+
First, nanobind does not have a build-time dependency on the libraries and
475+
frameworks (NumPy, PyTorch, CUDA, etc.) that would otherwise be necessary
476+
to perform the copy.
477+
Second, frameworks have the opportunity to optimize how the copy is created.
478+
The copy is owned by framework, so the framework can choose to use a custom
479+
memory allocator, over-align the data, etc. based on the nd-array's size,
480+
the specific CPU, GPU, or memory types detected, etc.
481+
482+
461483
.. _ndarray-temporaries:
462484

463485
Returning temporaries
@@ -643,26 +665,92 @@ support inter-framework data exchange, custom array types should implement the
643665
- `__dlpack__ <https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html#array_api.array.__dlpack__>`__ and
644666
- `__dlpack_device__ <https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack_device__.html#array_api.array.__dlpack_device__>`__
645667

646-
methods. This is easy thanks to the nd-array integration in nanobind. An example is shown below:
668+
methods.
669+
These, as well as the buffer protocol, are implemented in the object returned
670+
by nanobind when specifying :cpp:class:`nb::arrayapi <arrayapi>` as the
671+
framework template parameter.
672+
For example:
647673

648674
.. code-block:: cpp
649675
650-
nb::class_<MyArray>(m, "MyArray")
651-
// ...
652-
.def("__dlpack__", [](nb::kwargs kwargs) {
653-
return nb::ndarray<>( /* ... */);
654-
})
655-
.def("__dlpack_device__", []() {
656-
return std::make_pair(nb::device::cpu::value, 0);
657-
});
676+
class MyArray {
677+
double* d;
678+
public:
679+
MyArray() { d = new double[5] { 0.0, 1.0, 2.0, 3.0, 4.0 }; }
680+
~MyArray() { delete[] d; }
681+
double* data() const { return d; }
682+
};
683+
684+
nb::class_<MyArray>(m, "MyArray")
685+
.def(nb::init<>())
686+
.def("arrayapi", [](const MyArray& self) {
687+
return nb::ndarray<nb::arrayapi, double>(self.data(), {5});
688+
}, nb::rv_policy::reference_internal);
689+
690+
which can be used as follows:
691+
692+
.. code-block:: pycon
658693
659-
Returning a raw :cpp:class:`nb::ndarray <ndarray>` without framework annotation
660-
will produce a DLPack capsule, which is what the interface expects.
694+
>>> import my_extension
695+
>>> ma = my_extension.MyArray()
696+
>>> aa = ma.arrayapi()
697+
>>> aa.__dlpack_device__()
698+
(1, 0)
699+
>>> import numpy as np
700+
>>> x = np.from_dlpack(aa)
701+
>>> x
702+
array([0., 1., 2., 3., 4.])
703+
704+
The DLPack methods can also be provided for the class itself, by implementing
705+
``__dlpack__()`` as a wrapper function.
706+
For example, by adding the following lines to the binding:
707+
708+
.. code-block:: cpp
709+
710+
.def("__dlpack__", [](nb::pointer_and_handle<MyArray> self,
711+
nb::kwargs kwargs) {
712+
using arrayapi_t = nb::ndarray<nb::arrayapi, double>;
713+
nb::object aa = nb::cast(arrayapi_t(self.p->data(), {5}),
714+
nb::rv_policy::reference_internal,
715+
self.h);
716+
nb::object max = kwargs.get("max_version", nb::none());
717+
return aa.attr("__dlpack__")(nb::arg("max_version") = max);
718+
})
719+
.def("__dlpack_device__", [](nb::handle /*self*/) {
720+
return std::make_pair(nb::device::cpu::value, 0);
721+
})
722+
723+
the class can be used as follows:
724+
725+
.. code-block:: pycon
726+
727+
>>> import my_extension
728+
>>> ma = my_extension.MyArray()
729+
>>> ma.__dlpack_device__()
730+
(1, 0)
731+
>>> import numpy as np
732+
>>> y = np.from_dlpack(ma)
733+
>>> y
734+
array([0., 1., 2., 3., 4.])
735+
736+
737+
The ``kwargs`` argument in the implementation of ``__dlpack__`` above can be
738+
used to support additional parameters (e.g., to allow the caller to request a
739+
copy). Please see the DLPack documentation for details.
740+
741+
The caller may or may not supply the keyword argument ``max_version``.
742+
If it is not supplied or has the value ``None``, nanobind will return an
743+
unversioned ``DLManagedTensor`` in a capsule named ``dltensor``.
744+
If its value is a tuple of integers ``(major_version, minor_version)`` and the
745+
major version is at least 1, nanobind will return a ``DLManagedTensorVersioned``
746+
in a capsule named ``dltensor_versioned``.
747+
Nanobind ignores other keyword arguments.
748+
In particular, it cannot transfer the array's data to another device (such as
749+
a GPU), nor can it make a copy of the data.
750+
A custom class (such as ``MyArray`` above) could provide such functionality.
751+
Often, the caller framework takes care of copying and inter-device data
752+
transfer and does not ask the producer, ``MyArray``, to perform them.
661753

662-
The ``kwargs`` argument can be used to provide additional parameters (for
663-
example to request a copy), please see the DLPack documentation for details.
664-
Note that nanobind does not yet implement the versioned DLPack protocol. The
665-
version number should be ignored for now.
666754

667755
Frequently asked questions
668756
--------------------------
@@ -708,7 +796,3 @@ be more restrictive. Presently supported dtypes include signed/unsigned
708796
integers, floating point values, complex numbers, and boolean values. Some
709797
:ref:`nonstandard arithmetic types <ndarray-nonstandard>` can be supported as
710798
well.
711-
712-
Nanobind can receive and return *read-only* arrays via the buffer protocol when
713-
exhanging data with NumPy. The DLPack interface currently ignores this
714-
annotation.

include/nanobind/nb_defs.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@
209209
X(const X &) = delete; \
210210
X &operator=(const X &) = delete;
211211

212-
#define NB_MOD_STATE_SIZE 80
212+
#define NB_MOD_STATE_SIZE 96
213213

214214
// Helper macros to ensure macro arguments are expanded before token pasting/stringification
215215
#define NB_MODULE_IMPL(name, variable) NB_MODULE_IMPL2(name, variable)

include/nanobind/nb_lib.h

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@ NAMESPACE_BEGIN(NB_NAMESPACE)
1212
NAMESPACE_BEGIN(dlpack)
1313

1414
// The version of DLPack that is supported by libnanobind
15-
static constexpr uint32_t major_version = 0;
16-
static constexpr uint32_t minor_version = 0;
15+
static constexpr uint32_t major_version = 1;
16+
static constexpr uint32_t minor_version = 1;
1717

1818
// Forward declarations for types in ndarray.h (1)
1919
struct dltensor;
@@ -289,7 +289,7 @@ NB_CORE PyObject *capsule_new(const void *ptr, const char *name,
289289
struct func_data_prelim_base;
290290

291291
/// Create a Python function object for the given function record
292-
NB_CORE PyObject *nb_func_new(const func_data_prelim_base *data) noexcept;
292+
NB_CORE PyObject *nb_func_new(const func_data_prelim_base *f) noexcept;
293293

294294
// ========================================================================
295295

@@ -481,7 +481,7 @@ NB_CORE ndarray_handle *ndarray_import(PyObject *o,
481481
cleanup_list *cleanup) noexcept;
482482

483483
// Describe a local ndarray object using a DLPack capsule
484-
NB_CORE ndarray_handle *ndarray_create(void *value, size_t ndim,
484+
NB_CORE ndarray_handle *ndarray_create(void *data, size_t ndim,
485485
const size_t *shape, PyObject *owner,
486486
const int64_t *strides,
487487
dlpack::dtype dtype, bool ro,

include/nanobind/ndarray.h

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,16 @@
1818

1919
NAMESPACE_BEGIN(NB_NAMESPACE)
2020

21-
/// dlpack API/ABI data structures are part of a separate namespace
21+
/// DLPack API/ABI data structures are part of a separate namespace.
2222
NAMESPACE_BEGIN(dlpack)
2323

2424
enum class dtype_code : uint8_t {
25-
Int = 0, UInt = 1, Float = 2, Bfloat = 4, Complex = 5, Bool = 6
25+
Int = 0, UInt = 1, Float = 2, Bfloat = 4, Complex = 5, Bool = 6,
26+
Float8_E3M4 = 7, Float8_E4M3 = 8, Float8_E4M3B11FNUZ = 9,
27+
Float8_E4M3FN = 10, Float8_E4M3FNUZ = 11, Float8_E5M2 = 12,
28+
Float8_E5M2FNUZ = 13, Float8_E8M0FNU = 14,
29+
Float6_E2M3FN = 15, Float6_E3M2FN = 16,
30+
Float4_E2M1FN = 17
2631
};
2732

2833
struct device {
@@ -86,6 +91,7 @@ NB_FRAMEWORK(tensorflow, 3, "tensorflow.python.framework.ops.EagerTensor");
8691
NB_FRAMEWORK(jax, 4, "jaxlib.xla_extension.DeviceArray");
8792
NB_FRAMEWORK(cupy, 5, "cupy.ndarray");
8893
NB_FRAMEWORK(memview, 6, "memoryview");
94+
NB_FRAMEWORK(arrayapi, 7, "ArrayLike");
8995

9096
NAMESPACE_BEGIN(device)
9197
NB_DEVICE(none, 0); NB_DEVICE(cpu, 1); NB_DEVICE(cuda, 2);

src/nb_internals.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,8 @@ PyTypeObject *nb_meta_cache = nullptr;
168168
static const char* interned_c_strs[pyobj_name::string_count] {
169169
"value",
170170
"copy",
171+
"clone",
172+
"array",
171173
"from_dlpack",
172174
"__dlpack__",
173175
"max_version",

src/nb_internals.h

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -426,6 +426,8 @@ struct pyobj_name {
426426
enum : int {
427427
value_str = 0, // string "value"
428428
copy_str, // string "copy"
429+
clone_str, // string "clone"
430+
array_str, // string "array"
429431
from_dlpack_str, // string "from_dlpack"
430432
dunder_dlpack_str, // string "__dlpack__"
431433
max_version_str, // string "max_version"
@@ -490,11 +492,12 @@ inline void *inst_ptr(nb_inst *self) {
490492
}
491493

492494
template <typename T> struct scoped_pymalloc {
493-
scoped_pymalloc(size_t size = 1) {
494-
ptr = (T *) PyMem_Malloc(size * sizeof(T));
495+
scoped_pymalloc(size_t size = 1, size_t extra_bytes = 0) {
496+
// Tip: construct objects in the extra bytes using placement new.
497+
ptr = (T *) PyMem_Malloc(size * sizeof(T) + extra_bytes);
495498
if (!ptr)
496499
fail("scoped_pymalloc(): could not allocate %llu bytes of memory!",
497-
(unsigned long long) size);
500+
(unsigned long long) (size * sizeof(T) + extra_bytes));
498501
}
499502
~scoped_pymalloc() { PyMem_Free(ptr); }
500503
T *release() {

0 commit comments

Comments
 (0)