Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .codespell-ignore-lines
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ template <op_id id, op_type ot, typename L = undefined_t, typename R = undefined
template <typename ThisT>
auto &this_ = static_cast<ThisT &>(*this);
if (load_impl<ThisT>(temp, false)) {
return load_impl<ThisT>(src, false);
ssize_t nd = 0;
auto trivial = broadcast(buffers, nd, shape);
auto ndim = (size_t) nd;
Expand Down
88 changes: 85 additions & 3 deletions include/pybind11/cast.h
Original file line number Diff line number Diff line change
Expand Up @@ -863,6 +863,71 @@ struct holder_helper {
static auto get(const T &p) -> decltype(p.get()) { return p.get(); }
};

struct holder_caster_foreign_helpers {
struct py_deleter {
void operator()(const void *) const noexcept {
// Don't run the deleter if the interpreter has been shut down
if (Py_IsInitialized() == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR: Could you please make this

            if (Py_IsInitialized() == 0 || _Py_IsFinalizing() != 0) {

?


Separately later: see draft PR #5864

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Py_IsFinalizing remains true after Py_Finalize[Ex] has returned, so a true return from it does not imply it's safe to decref. Doing a decref after the interpreter has finalized will crash the program; skipping one during finalization (when it would still be OK) causes a memory leak. Of the two, I prefer the memory leak.

The more correct solution would be to have our own "interpreter has finalized" flag that we set from either the destructor of the internals capsule or a Py_AtExit callback. (Probably only the former is subinterpreter-safe.) Both of those occur very late in interpreter finalization, and in particular occur after the final GC pass.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ...

skipping one during finalization (when it would still be OK)

I assumed it's not OK. — How can we gain certainty?

Claude and ChatGPT both suggest that guarding with !Py_IsFinalizing() is important.

Py_IsFinalizing remains true after Py_Finalize[Ex] has returned

I assumed Py_IsInitialized() will then be false.

We have this existing code:

#ifdef GRAALVM_PYTHON
if (!Py_IsInitialized() || _Py_IsFinalizing()) {
return;
}
#endif

That's GraalPy-specific, but I was thinking it's correct in general.

I also assumed _Py_IsFinalizing() still works with all Python versions, although only Py_IsFinalizing() is actually a public API, since Python 3.9; but we are still supporting Python 3.8.


Everything else in this PR looks great to me, it's just this one line that I'm still worried about.

Copy link
Collaborator Author

@oremanj oremanj Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I misread your suggestion as trying to allow decref during finalization. As written, the IsFinalizing term is redundant because IsFinalizing becomes true exactly when IsInitialized becomes false. From cpython/Python/pylifecycle.c:

    _PyInterpreterState_SetFinalizing(tstate->interp, tstate);
    _PyRuntimeState_SetFinalizing(runtime, tstate);
    runtime->initialized = 0;
    runtime->core_initialized = 0;

IsFinalizing checks _PyRuntimeState_GetFinalizing while IsInitialized checks runtime->initialized.

I assumed it's not OK. — How can we gain certainty?

The best documentation I know of what happens during finalization is hudson-trading/pymetabind#2 (comment) -- this is not an API guarantee / could change, but as the comment itself notes, that doesn't mean we don't have to interact with it. Pablo is a CPython core developer who has done a lot of work on the finalization process.

The higher-level reason to believe it's OK to decref during finalization is that Python itself decrefs during finalization all over the place! Python actually goes to fairly great lengths to attempt to ensure that there are no objects left alive when the interpreter exits. That requires a lot of individual references to be broken. There is an entire GC pass that occurs after IsFinalizing becomes true.

We can assume that it's still OK to decref when our capsule destructor is invoked because our capsule destructor was invoked by a decref (the one took the refcount of the capsule to zero).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time to comb through the cpython code, so I fell back to asking ChatGPT again (same URL, one extra prompt/response):

https://chatgpt.com/share/68ed515d-82e4-8008-8954-784843385bf9

It explained why it still recommends also checking Py_IsFinalizing() although only after acquiring the GIL (it suggested that before but I overlooked the specific order before).

Up to you. — In my code, I'd definitely want to use the suggested pattern, it's a very easy and inexpensive way to (maybe) err on the safer side.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChatGPT is mostly wrong here. I will respond to its points in detail.

The order in CPython is “mark finalizing” → “clear initialized.”

True.

On a foreign thread without the GIL, you can observe finalizing == true while initialized == true for a moment.

True but irrelevant. When finalizing is set, non-daemon threads have already been joined, and daemonic threads are detached (unable to execute Python code). If this is running from a daemonic thread, it doesn't matter which value we observe for IsInitializing, because if we're finalizing then the thread will either freeze or exit when we attempt to acquire the GIL immediately below. (Since GIL acquisition can't fail, it freezes if you try to do it during finalization. This is documented and I believe there's a PEP that's trying to fix it.) We can only wind up actually executing the DECREF if we're not finalizing yet or if we're running in the main thread. But in the latter case, the race does not exist.

Subinterpreters: Py_IsInitialized() is a process-wide flag; it says nothing about whether a particular interpreter (or the one owning your object) is already tearing down. Py_IsFinalizing() is also global today, so neither API tells you “this object’s interpreter is safe.” Treat them as coarse signals, not per-interpreter guarantees.

This does not point at any distinction between the validity of the logic with the IsFinalizing check vs without.

CPython does a lot of decref/cleanup during shutdown, but under tight control:

This is FUD. It's a safe approximation but doesn't understand the details.

It holds the GIL, knows exactly which objects it’s touching, and often orders teardown to avoid arbitrary callbacks (or at least accepts the risk in well-understood places).

The final GC pass executes well after IsFinalizing becomes true and can destroy arbitrary user-created objects that do arbitrary things.

Your extension’s Py_DECREF from a foreign thread or a C++ destructor can invoke: [...]

Foreign thread is specious as explained above. Thus, any shared_ptr destruction that occurs during finalization must be indirectly executing from within a finalizer of a user-provided Python object. Thus it's OK for us to potentially invoke other finalizers of user-provided Python objects.

shared_ptr destruction after finalization is common if a shared_ptr is held in a C++-level global, and it is important that we not call into Python there. That's the main reason for the IsInitialized check.

Capsule destructor case: the fact that a decref inside CPython dropped the capsule to zero (calling your destructor) doesn’t make your subsequent Py_DECREFs safe—your destructor may touch objects tied to interpreters or modules already mid-teardown.

This is confusing the question about whether we can decref an arbitrary object during finalization (answer: yes because we're only doing it from within another decref of an arbitrary object) with the question about what we can do in the capsule destructor. The capsule destructor executes much later - clearing the interpreter state dict is one of the last things done during finalization. We should not run arbitrary Python code there. But we're not: I'm just proposing we have a C++ function that sets a C++ flag.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're lightyears ahead of me with your understanding of the cpython internals, especially threading and cleanup mechanics. Thanks for the explanation!

return;
}
gil_scoped_acquire guard;
Py_DECREF(o);
}

PyObject *o;
};

template <typename type>
static auto set_via_shared_from_this(type *value, std::shared_ptr<type> *holder_out)
-> decltype(value->shared_from_this(), bool()) {
// object derives from enable_shared_from_this;
// try to reuse an existing shared_ptr if one is known
if (auto existing = try_get_shared_from_this(value)) {
*holder_out = std::static_pointer_cast<type>(existing);
return true;
}
return false;
}

template <typename type>
static bool set_via_shared_from_this(void *, std::shared_ptr<type> *) {
return false;
}

template <typename type>
static bool set_foreign_holder(handle src, type *value, std::shared_ptr<type> *holder_out) {
// We only support using std::shared_ptr<T> for foreign T, and
// it's done by creating a new shared_ptr control block that
// owns a reference to the original Python object.
if (value == nullptr) {
*holder_out = {};
return true;
}
if (set_via_shared_from_this(value, holder_out)) {
return true;
}
*holder_out = std::shared_ptr<type>(value, py_deleter{src.inc_ref().ptr()});
return true;
}

template <typename type>
static bool
set_foreign_holder(handle src, const type *value, std::shared_ptr<const type> *holder_out) {
std::shared_ptr<type> holder_mut;
if (set_foreign_holder(src, const_cast<type *>(value), &holder_mut)) {
*holder_out = holder_mut;
return true;
}
return false;
}

template <typename type>
static bool set_foreign_holder(handle, type *, ...) {
throw cast_error("Unable to cast foreign type to held instance -- "
"only std::shared_ptr<T> is supported in this case");
}
};

// SMART_HOLDER_BAKEIN_FOLLOW_ON: Rewrite comment, with reference to shared_ptr specialization.
/// Type caster for holder types like std::shared_ptr, etc.
/// The SFINAE hook is provided to help work around the current lack of support
Expand Down Expand Up @@ -907,6 +972,10 @@ struct copyable_holder_caster : public type_caster_base<type> {
}
}

bool set_foreign_holder(handle src) {
return holder_caster_foreign_helpers::set_foreign_holder(src, (type *) value, &holder);
}

void load_value(value_and_holder &&v_h) {
if (v_h.holder_constructed()) {
value = v_h.value_ptr();
Expand Down Expand Up @@ -977,22 +1046,22 @@ struct copyable_holder_caster<
}

explicit operator std::shared_ptr<type> *() {
if (typeinfo->holder_enum_v == detail::holder_enum_t::smart_holder) {
if (sh_load_helper.was_populated) {
pybind11_fail("Passing `std::shared_ptr<T> *` from Python to C++ is not supported "
"(inherently unsafe).");
}
return std::addressof(shared_ptr_storage);
}

explicit operator std::shared_ptr<type> &() {
if (typeinfo->holder_enum_v == detail::holder_enum_t::smart_holder) {
if (sh_load_helper.was_populated) {
shared_ptr_storage = sh_load_helper.load_as_shared_ptr(typeinfo, value);
}
return shared_ptr_storage;
}

std::weak_ptr<type> potentially_slicing_weak_ptr() {
if (typeinfo->holder_enum_v == detail::holder_enum_t::smart_holder) {
if (sh_load_helper.was_populated) {
// Reusing shared_ptr code to minimize code complexity.
shared_ptr_storage
= sh_load_helper.load_as_shared_ptr(typeinfo,
Expand Down Expand Up @@ -1041,6 +1110,11 @@ struct copyable_holder_caster<
}
}

bool set_foreign_holder(handle src) {
return holder_caster_foreign_helpers::set_foreign_holder(
src, (type *) value, &shared_ptr_storage);
}

void load_value(value_and_holder &&v_h) {
if (typeinfo->holder_enum_v == detail::holder_enum_t::smart_holder) {
sh_load_helper.loaded_v_h = v_h;
Expand Down Expand Up @@ -1078,6 +1152,7 @@ struct copyable_holder_caster<
value = cast.second(sub_caster.value);
if (typeinfo->holder_enum_v == detail::holder_enum_t::smart_holder) {
sh_load_helper.loaded_v_h = sub_caster.sh_load_helper.loaded_v_h;
sh_load_helper.was_populated = true;
} else {
shared_ptr_storage
= std::shared_ptr<type>(sub_caster.shared_ptr_storage, (type *) value);
Expand Down Expand Up @@ -1224,6 +1299,12 @@ struct move_only_holder_caster<
return false;
}

bool set_foreign_holder(handle) {
throw cast_error("Foreign instance cannot be converted to std::unique_ptr "
"because we don't know how to make it relinquish "
"ownership");
}

void load_value(value_and_holder &&v_h) {
if (typeinfo->holder_enum_v == detail::holder_enum_t::smart_holder) {
sh_load_helper.loaded_v_h = v_h;
Expand Down Expand Up @@ -1282,6 +1363,7 @@ struct move_only_holder_caster<
value = cast.second(sub_caster.value);
if (typeinfo->holder_enum_v == detail::holder_enum_t::smart_holder) {
sh_load_helper.loaded_v_h = sub_caster.sh_load_helper.loaded_v_h;
sh_load_helper.was_populated = true;
} else {
pybind11_fail("Expected to be UNREACHABLE: " __FILE__
":" PYBIND11_TOSTRING(__LINE__));
Expand Down
11 changes: 6 additions & 5 deletions include/pybind11/detail/type_caster_base.h
Original file line number Diff line number Diff line change
Expand Up @@ -1058,6 +1058,7 @@ class type_caster_generic {
return false;
}
void check_holder_compat() {}
bool set_foreign_holder(handle) { return true; }

PYBIND11_NOINLINE static void *local_load(PyObject *src, const type_info *ti) {
auto caster = type_caster_generic(ti);
Expand Down Expand Up @@ -1096,14 +1097,14 @@ class type_caster_generic {
// logic (without having to resort to virtual inheritance).
template <typename ThisT>
PYBIND11_NOINLINE bool load_impl(handle src, bool convert) {
auto &this_ = static_cast<ThisT &>(*this);
if (!src) {
return false;
}
if (!typeinfo) {
return try_load_foreign_module_local(src);
return try_load_foreign_module_local(src) && this_.set_foreign_holder(src);
}

auto &this_ = static_cast<ThisT &>(*this);
this_.check_holder_compat();

PyTypeObject *srctype = Py_TYPE(src.ptr());
Expand Down Expand Up @@ -1169,13 +1170,13 @@ class type_caster_generic {
if (typeinfo->module_local) {
if (auto *gtype = get_global_type_info(*typeinfo->cpptype)) {
typeinfo = gtype;
return load(src, false);
return load_impl<ThisT>(src, false);
}
}

// Global typeinfo has precedence over foreign module_local
if (try_load_foreign_module_local(src)) {
return true;
return this_.set_foreign_holder(src);
}

// Custom converters didn't take None, now we convert None to nullptr.
Expand All @@ -1189,7 +1190,7 @@ class type_caster_generic {
}

if (convert && cpptype && this_.try_cpp_conduit(src)) {
return true;
return this_.set_foreign_holder(src);
}

return false;
Expand Down
10 changes: 5 additions & 5 deletions tests/local_bindings.h
Original file line number Diff line number Diff line change
Expand Up @@ -65,11 +65,11 @@ PYBIND11_MAKE_OPAQUE(NonLocalMap)
PYBIND11_MAKE_OPAQUE(NonLocalMap2)

// Simple bindings (used with the above):
template <typename T, int Adjust = 0, typename... Args>
py::class_<T> bind_local(Args &&...args) {
return py::class_<T>(std::forward<Args>(args)...).def(py::init<int>()).def("get", [](T &i) {
return i.i + Adjust;
});
template <typename T, int Adjust = 0, typename Holder = std::unique_ptr<T>, typename... Args>
py::class_<T, Holder> bind_local(Args &&...args) {
return py::class_<T, Holder>(std::forward<Args>(args)...)
.def(py::init<int>())
.def("get", [](T &i) { return i.i + Adjust; });
}

// Simulate a foreign library base class (to match the example in the docs):
Expand Down
5 changes: 3 additions & 2 deletions tests/pybind11_cross_module_tests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,9 @@ PYBIND11_MODULE(pybind11_cross_module_tests, m, py::mod_gil_not_used()) {
// relevant pybind11_tests submodule from a test_whatever.py

// test_load_external
bind_local<ExternalType1>(m, "ExternalType1", py::module_local());
bind_local<ExternalType2>(m, "ExternalType2", py::module_local());
bind_local<ExternalType1, 0, std::shared_ptr<ExternalType1>>(
m, "ExternalType1", py::module_local());
bind_local<ExternalType2, 0, py::smart_holder>(m, "ExternalType2", py::module_local());

// test_exceptions.py
py::register_local_exception<LocalSimpleException>(m, "LocalSimpleException");
Expand Down
23 changes: 23 additions & 0 deletions tests/test_local_bindings.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,29 @@ TEST_SUBMODULE(local_bindings, m) {
m.def("load_external1", [](ExternalType1 &e) { return e.i; });
m.def("load_external2", [](ExternalType2 &e) { return e.i; });

struct SharedKeepAlive {
std::shared_ptr<int> contents;
int value() const { return contents ? *contents : -1; }
long use_count() const { return contents.use_count(); }
};
py::class_<SharedKeepAlive>(m, "SharedKeepAlive")
.def_property_readonly("value", &SharedKeepAlive::value)
.def_property_readonly("use_count", &SharedKeepAlive::use_count);
m.def("load_external1_shared", [](const std::shared_ptr<ExternalType1> &p) {
return SharedKeepAlive{std::shared_ptr<int>(p, &p->i)};
});
m.def("load_external2_shared", [](const std::shared_ptr<ExternalType2> &p) {
return SharedKeepAlive{std::shared_ptr<int>(p, &p->i)};
});
m.def("load_external2_unique", [](std::unique_ptr<ExternalType2> p) { return p->i; });

// Aspects of set_foreign_holder that are not covered:
// - loading a foreign instance into a custom holder should fail
// - we're only covering the case where the local module doesn't know
// about the type; the paths where it does (e.g., if both global and
// foreign-module-local bindings exist for the same type) should work
// the same way (they use the same code so they very likely do)

// test_local_bindings
// Register a class with py::module_local:
bind_local<LocalType, -1>(m, "LocalType", py::module_local()).def("get3", [](LocalType &t) {
Expand Down
28 changes: 28 additions & 0 deletions tests/test_local_bindings.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
from __future__ import annotations

import sys
from contextlib import suppress

import pytest

from pybind11_tests import local_bindings as m
Expand All @@ -20,6 +23,31 @@ def test_load_external():
assert m.load_external1(cm.ExternalType2(12)) == 12
assert "incompatible function arguments" in str(excinfo.value)

def test_shared(val, ctor, loader):
obj = ctor(val)
with suppress(AttributeError): # non-cpython VMs don't have getrefcount
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msimacek I just looked:

$ cd tests/
$ git grep 'Cannot reliably trigger GC' | wc -l
48

Do you think we could enable more test coverage for GraalPy by adopting Joshua's approach here more widely?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem of gc.collect() not reliably GCing things is broader than the problem of not having refcounts. PyPy doesn't have refcounts but its gc.collect() works (if you call it a few times).

rc_before = sys.getrefcount(obj)
wrapper = loader(obj)
# wrapper holds a shared_ptr that keeps obj alive
assert wrapper.use_count == 1
assert wrapper.value == val
with suppress(AttributeError):
rc_after = sys.getrefcount(obj)
assert rc_after > rc_before

test_shared(110, cm.ExternalType1, m.load_external1_shared)
test_shared(220, cm.ExternalType2, m.load_external2_shared)

with pytest.raises(TypeError, match="incompatible function arguments"):
test_shared(210, cm.ExternalType1, m.load_external2_shared)
with pytest.raises(TypeError, match="incompatible function arguments"):
test_shared(120, cm.ExternalType2, m.load_external1_shared)

with pytest.raises(
RuntimeError, match="Foreign instance cannot be converted to std::unique_ptr"
):
m.load_external2_unique(cm.ExternalType2(2200))


def test_local_bindings():
"""Tests that duplicate `py::module_local` class bindings work across modules"""
Expand Down
Loading