Skip to content

Commit 87d278d

Browse files
committed
Add support to ndarray for DLPack version 1
1 parent f2499d4 commit 87d278d

16 files changed

+1005
-380
lines changed

docs/api_extra.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1108,6 +1108,11 @@ convert into an equivalent representation in one of the following frameworks:
11081108

11091109
Builtin Python ``memoryview`` for CPU-resident data.
11101110

1111+
.. cpp:class:: arrayapi
1112+
1113+
An object that both implements the buffer protocol and also has the
1114+
``__dlpack__`` and ``_dlpack_device__`` attributes.
1115+
11111116
Eigen convenience type aliases
11121117
------------------------------
11131118

docs/changelog.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,14 @@ Version TBD (not yet released)
2222
Clang-based Intel compiler). Continuous integration tests have been added to
2323
ensure compatibility with these compilers on an ongoing basis.
2424

25+
- The framework ``nb::arrayapi`` is now available to return an nd-array from
26+
C++ to Python as an object that supports both the Python buffer protocol as
27+
well as the DLPack methods ``__dlpack__`` and ``_dlpack_device__``.
28+
Nanobind now supports importing and exporting nd-arrays via capsules that
29+
contain the ``DLManagedTensorVersioned`` struct, which has a flag bit
30+
indicating the nd-array is read-only.
31+
(PR `#1175 <https://github.com/wjakob/nanobind/pull/1175>`__).
32+
2533
Version 2.9.2 (Sep 4, 2025)
2634
---------------------------
2735

docs/ndarray.rst

Lines changed: 103 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -275,12 +275,19 @@ desired Python type.
275275
- :cpp:class:`nb::tensorflow <tensorflow>`: create a ``tensorflow.python.framework.ops.EagerTensor``.
276276
- :cpp:class:`nb::jax <jax>`: create a ``jaxlib.xla_extension.DeviceArray``.
277277
- :cpp:class:`nb::cupy <cupy>`: create a ``cupy.ndarray``.
278+
- :cpp:class:`nb::memview <memview>`: create a Python ``memoryview``.
279+
- :cpp:class:`nb::arrayapi <arrayapi>`: create an object that supports the
280+
Python buffer protocol (i.e., is accepted as an argument to ``memoryview()``)
281+
and also has the DLPack attributes ``__dlpack__`` and ``_dlpack_device__``
282+
(i.e., it is accepted as an argument to a framework's ``from_dlpack()``
283+
function).
278284
- No framework annotation. In this case, nanobind will create a raw Python
279285
``dltensor`` `capsule <https://docs.python.org/3/c-api/capsule.html>`__
280-
representing the `DLPack <https://github.com/dmlc/dlpack>`__ metadata.
286+
representing the `DLPack <https://github.com/dmlc/dlpack>`__ metadata of
287+
a ``DLManagedTensor``.
281288

282289
This annotation also affects the auto-generated docstring of the function,
283-
which in this case becomes:
290+
which in this example's case becomes:
284291

285292
.. code-block:: python
286293
@@ -458,6 +465,13 @@ interpreted as follows:
458465
- :cpp:enumerator:`rv_policy::move` is unsupported and demoted to
459466
:cpp:enumerator:`rv_policy::copy`.
460467

468+
Note that when a copy is returned, the copy is made by the framework, not by
469+
nanobind itself.
470+
For example, ``numpy.array`` is passed the keyword argument ``copy`` with
471+
value ``True``, or the PyTorch ``clone`` member function is immediately
472+
called on the tensor to create the copy.
473+
474+
461475
.. _ndarray-temporaries:
462476

463477
Returning temporaries
@@ -643,26 +657,98 @@ support inter-framework data exchange, custom array types should implement the
643657
- `__dlpack__ <https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html#array_api.array.__dlpack__>`__ and
644658
- `__dlpack_device__ <https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack_device__.html#array_api.array.__dlpack_device__>`__
645659

646-
methods. This is easy thanks to the nd-array integration in nanobind. An example is shown below:
660+
methods.
661+
These, as well as the buffer protocol, are implemented in the object returned
662+
by nanobind when specifying :cpp:class:`nb::arrayapi <arrayapi>` as the
663+
framework template parameter.
664+
For example:
647665

648666
.. code-block:: cpp
649667
650-
nb::class_<MyArray>(m, "MyArray")
651-
// ...
652-
.def("__dlpack__", [](nb::kwargs kwargs) {
653-
return nb::ndarray<>( /* ... */);
654-
})
655-
.def("__dlpack_device__", []() {
656-
return std::make_pair(nb::device::cpu::value, 0);
657-
});
668+
class MyArray {
669+
double* data;
670+
public:
671+
MyArray() { data = new double[5] { 0.0, 1.0, 2.0, 3.0, 4.0 }; }
672+
~MyArray() { delete[] data; }
673+
auto arrayapi() {
674+
return nb::ndarray<nb::arrayapi, double>(data, {5});
675+
}
676+
};
677+
678+
nb::class_<MyArray>(m, "MyArray")
679+
.def(nb::init<>())
680+
.def("arrayapi", &MyArray::arrayapi, nb::rv_policy::reference_internal);
681+
682+
which can be used as follows:
683+
684+
.. code-block:: pycon
685+
686+
>>> import my_extension
687+
>>> ma = my_extension.MyArray()
688+
>>> aa = ma.arrayapi()
689+
>>> aa.__dlpack_device__()
690+
(1, 0)
691+
>>> import numpy as np
692+
>>> x = np.from_dlpack(aa)
693+
>>> x
694+
array([0., 1., 2., 3., 4.])
695+
696+
The DLPack methods can also be provided in the class itself, by implementing
697+
``__dlpack__()`` as a wrapper function.
698+
For example, add the following member functions to the ``MyArray`` class:
699+
700+
.. code-block:: cpp
658701
659-
Returning a raw :cpp:class:`nb::ndarray <ndarray>` without framework annotation
660-
will produce a DLPack capsule, which is what the interface expects.
702+
auto dlpack(nb::kwargs kwargs) {
703+
nb::object aa =
704+
nb::cast(nb::ndarray<nb::arrayapi, double>(data, {5}),
705+
nb::rv_policy::reference_internal,
706+
nb::find(*this));
707+
nb::object max_version = kwargs.get("max_version", nb::none());
708+
return aa.attr("__dlpack__")(nb::arg("max_version") = max_version);
709+
}
710+
auto dlpack_device() {
711+
return std::make_pair(nb::device::cpu::value, 0);
712+
}
713+
714+
and add the following two lines to the binding:
715+
716+
.. code-block:: cpp
717+
718+
.def("__dlpack__", &MyArray::dlpack)
719+
.def("__dlpack_device__", &MyArray::dlpack_device)
720+
721+
Then the class can be used as follows:
722+
723+
.. code-block:: pycon
724+
725+
>>> import my_extension
726+
>>> ma = my_extension.MyArray()
727+
>>> ma.__dlpack_device__()
728+
(1, 0)
729+
>>> import numpy as np
730+
>>> y = np.from_dlpack(ma)
731+
>>> y
732+
array([0., 1., 2., 3., 4.])
733+
734+
735+
The ``kwargs`` argument in the implementation of ``__dlpack__`` above can be
736+
used to support additional parameters (e.g., to allow the caller to request a
737+
copy). Please see the DLPack documentation for details.
738+
739+
The caller may or may not supply the keyword argument ``max_version``.
740+
If it is not supplied or has the value ``None``, nanobind will return an
741+
unversioned ``DLManagedTensor`` in a capsule named ``dltensor``.
742+
If its value is a tuple of integers ``(major_version, minor_version)`` and the
743+
major version is at least 1, nanobind will return a ``DLManagedTensorVersioned``
744+
in a capsule named ``dltensor_versioned``.
745+
Nanobind ignores other keyword arguments.
746+
In particular, it cannot transfer the array's data to another device (such as
747+
a GPU), nor can it make a copy of the data.
748+
A custom class (such as ``MyArray`` above) could provide such functionality.
749+
Often, the caller framework takes care of copying and inter-device data
750+
transfer and does not ask the producer, ``MyArray``, to perform them.
661751

662-
The ``kwargs`` argument can be used to provide additional parameters (for
663-
example to request a copy), please see the DLPack documentation for details.
664-
Note that nanobind does not yet implement the versioned DLPack protocol. The
665-
version number should be ignored for now.
666752

667753
Frequently asked questions
668754
--------------------------
@@ -708,7 +794,3 @@ be more restrictive. Presently supported dtypes include signed/unsigned
708794
integers, floating point values, complex numbers, and boolean values. Some
709795
:ref:`nonstandard arithmetic types <ndarray-nonstandard>` can be supported as
710796
well.
711-
712-
Nanobind can receive and return *read-only* arrays via the buffer protocol when
713-
exhanging data with NumPy. The DLPack interface currently ignores this
714-
annotation.

include/nanobind/nb_defs.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@
209209
X(const X &) = delete; \
210210
X &operator=(const X &) = delete;
211211

212-
#define NB_MOD_STATE_SIZE 80
212+
#define NB_MOD_STATE_SIZE 96
213213

214214
// Helper macros to ensure macro arguments are expanded before token pasting/stringification
215215
#define NB_MODULE_IMPL(name, variable) NB_MODULE_IMPL2(name, variable)

include/nanobind/nb_lib.h

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@ NAMESPACE_BEGIN(NB_NAMESPACE)
1212
NAMESPACE_BEGIN(dlpack)
1313

1414
// The version of DLPack that is supported by libnanobind
15-
static constexpr uint32_t major_version = 0;
16-
static constexpr uint32_t minor_version = 0;
15+
static constexpr uint32_t major_version = 1;
16+
static constexpr uint32_t minor_version = 1;
1717

1818
// Forward declarations for types in ndarray.h (1)
1919
struct dltensor;
@@ -289,7 +289,7 @@ NB_CORE PyObject *capsule_new(const void *ptr, const char *name,
289289
struct func_data_prelim_base;
290290

291291
/// Create a Python function object for the given function record
292-
NB_CORE PyObject *nb_func_new(const func_data_prelim_base *data) noexcept;
292+
NB_CORE PyObject *nb_func_new(const func_data_prelim_base *f) noexcept;
293293

294294
// ========================================================================
295295

@@ -481,7 +481,7 @@ NB_CORE ndarray_handle *ndarray_import(PyObject *o,
481481
cleanup_list *cleanup) noexcept;
482482

483483
// Describe a local ndarray object using a DLPack capsule
484-
NB_CORE ndarray_handle *ndarray_create(void *value, size_t ndim,
484+
NB_CORE ndarray_handle *ndarray_create(void *data, size_t ndim,
485485
const size_t *shape, PyObject *owner,
486486
const int64_t *strides,
487487
dlpack::dtype dtype, bool ro,

include/nanobind/ndarray.h

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,16 @@
1818

1919
NAMESPACE_BEGIN(NB_NAMESPACE)
2020

21-
/// dlpack API/ABI data structures are part of a separate namespace
21+
/// DLPack API/ABI data structures are part of a separate namespace.
2222
NAMESPACE_BEGIN(dlpack)
2323

2424
enum class dtype_code : uint8_t {
25-
Int = 0, UInt = 1, Float = 2, Bfloat = 4, Complex = 5, Bool = 6
25+
Int = 0, UInt = 1, Float = 2, Bfloat = 4, Complex = 5, Bool = 6,
26+
Float8_e3m4 = 7, Float8_e4m3 = 8, Float8_e4m3b11fnuz = 9,
27+
Float8_e4m3fn = 10, Float8_e4m3fnuz = 11, Float8_e5m2 = 12,
28+
Float8_e5m2fnuz = 13, Float8_e8m0fnu = 14,
29+
Float6_e2m3fn = 15, Float6_e3m2fn = 16,
30+
Float4_e2m1fn = 17
2631
};
2732

2833
struct device {
@@ -86,6 +91,7 @@ NB_FRAMEWORK(tensorflow, 3, "tensorflow.python.framework.ops.EagerTensor");
8691
NB_FRAMEWORK(jax, 4, "jaxlib.xla_extension.DeviceArray");
8792
NB_FRAMEWORK(cupy, 5, "cupy.ndarray");
8893
NB_FRAMEWORK(memview, 6, "memoryview");
94+
NB_FRAMEWORK(arrayapi, 7, "ArrayLike");
8995

9096
NAMESPACE_BEGIN(device)
9197
NB_DEVICE(none, 0); NB_DEVICE(cpu, 1); NB_DEVICE(cuda, 2);

src/nb_internals.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,8 @@ PyTypeObject *nb_meta_cache = nullptr;
168168
static const char* interned_c_strs[pyobj_name::string_count] {
169169
"value",
170170
"copy",
171+
"clone",
172+
"array",
171173
"from_dlpack",
172174
"__dlpack__",
173175
"max_version",

src/nb_internals.h

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -426,6 +426,8 @@ struct pyobj_name {
426426
enum : int {
427427
value_str = 0, // string "value"
428428
copy_str, // string "copy"
429+
clone_str, // string "clone"
430+
array_str, // string "array"
429431
from_dlpack_str, // string "from_dlpack"
430432
dunder_dlpack_str, // string "__dlpack__"
431433
max_version_str, // string "max_version"
@@ -490,11 +492,12 @@ inline void *inst_ptr(nb_inst *self) {
490492
}
491493

492494
template <typename T> struct scoped_pymalloc {
493-
scoped_pymalloc(size_t size = 1) {
494-
ptr = (T *) PyMem_Malloc(size * sizeof(T));
495+
scoped_pymalloc(size_t size = 1, size_t extra_bytes = 0) {
496+
// Tip: construct objects in the extra bytes using placement new.
497+
ptr = (T *) PyMem_Malloc(size * sizeof(T) + extra_bytes);
495498
if (!ptr)
496499
fail("scoped_pymalloc(): could not allocate %llu bytes of memory!",
497-
(unsigned long long) size);
500+
(unsigned long long) (size * sizeof(T) + extra_bytes));
498501
}
499502
~scoped_pymalloc() { PyMem_Free(ptr); }
500503
T *release() {

0 commit comments

Comments
 (0)