Add fast_type_map, use it authoritatively for local types and as a hint for global types (ABI breaking) #5842

swolchok · 2025-09-17T16:39:49Z

Description

nanobind has a similar two-level lookup strategy, added and explained by
wjakob/nanobind@b515b1f . In brief, fast_type_map should be faster because std::type_index identity is based on the name, whereas fast_type_map uses pointer equality.

In this PR I've ported this approach to pybind11.

Performance seems to have improved. Using my patched fork of pybind11_benchmark
(https://github.com/swolchok/pybind11_benchmark/tree/benchmark-updates, specifically commit hash b6613d12607104d547b1c10a8145d1b3e9937266), I run bench.py and observe the MyInt case. Each time, I do 3 runs and just report all 3.

master, Mac: 75.9, 76.9, 75.3 nsec/loop
this PR, Mac: 73.8, 73.8, 73.6 nsec/loop
master, Linux box: 188, 187, 188 nsec/loop
this PR, Linux box: 164, 165, 164 nsec/loop

Note that the "real" percentage improvement is larger than implied by the above because master does not yet include #5824.

Suggested changelog entry:

Improved performance of function dispatch and type casting by porting two-level type info lookup strategy from nanobind.

swolchok · 2025-09-17T16:40:19Z

include/pybind11/detail/internals.h

+// REVIEW: do we need to add a fancy hash for pointers or is the
+// possible identity hash function from the standard library (e.g.,
+// libstdc++) sufficient?


note that the Linux box I used for benchmarking has libstdc++.

Just to ack that I've seen this, but I don't know.

It'd be good to rewrite the comment before merging if nobody knows.

include/pybind11/detail/internals.h

…nt for global types nanobind has a similar two-level lookup strategy, added and explained by wjakob/nanobind@b515b1f In this PR I've ported this approach to pybind11. To avoid an ABI break, I've kept the fast maps to the `local_internals`. I think this should be safe because any particular module should see its `local_internals` reset at least as often as the global `internals`, and misses in the fast "hint" map for global types fall back to the global `internals`. Performance seems to have improved. Using my patched fork of pybind11_benchmark (https://github.com/swolchok/pybind11_benchmark/tree/benchmark-updates, specifically commit hash b6613d12607104d547b1c10a8145d1b3e9937266), I run bench.py and observe the MyInt case. Each time, I do 3 runs and just report all 3. master, Mac: 75.9, 76.9, 75.3 nsec/loop this PR, Mac: 73.8, 73.8, 73.6 nsec/loop master, Linux box: 188, 187, 188 nsec/loop this PR, Linux box: 164, 165, 164 nsec/loop Note that the "real" percentage improvement is larger than implied by the above because master does not yet include pybind#5824.

oremanj · 2025-09-17T18:50:30Z

tests/test_embed/external_module.cpp

+    py::detail::get_local_internals_pp_manager().unref();
+
+    // now we unref the static global singleton internals
+    py::detail::get_local_internals_pp_manager().unref();


is this intended to unref the non-local internals? looks redundant as written

whoops, butchered my copy/paste/modify job

oremanj · 2025-09-17T18:54:10Z

include/pybind11/detail/type_caster_base.h

-    auto it = locals.find(tp);
+inline detail::type_info *get_local_type_info(const std::type_info &tp,
+                                              const local_internals &local_internals) {
+    const auto &locals = local_internals.registered_types_cpp;


Not changed in this PR but I think this function needs a with_internals lock too. Multiple threads could run bindings from the same DSO simultaneously. local_internals doesn't have its own lock and is generally protected by the global internals lock.

You could optimize by making the versions with a local_internals parameter assume internals is already locked. Then get_type_info() could query both internals under a single lock instead of acquiring, releasing, and re-acquiring.

This seems clearly out of scope for this PR.

(Also, doesn't the GIL preclude this sort of thing when present?)

This seems clearly out of scope for this PR.

I figured since you were reworking these functions anyway, I would point out an easy-looking opportunity to improve their thread-safety while you're here. I agree the PR doesn't make current behavior worse and I wasn't trying enforce a wider scope.

(Also, doesn't the GIL preclude this sort of thing when present?)

Yes, but pybind11 supports free-threading mode.

local_internals doesn't have its own lock and is generally protected by the global internals lock.

function_record_ptr_from_PyObject also accesses local_internals and isn't called under the global internals lock.

@swolchok Could you please help with general code health by creating an issue pointing to the two missing with_internals you guys have discovered here? We can then tag free-threading folks to bring that to their attention. Please tag me when you create the issue (I don't systematically look at new issues).

@rwgk looks like @oremanj already opened #5799 and mentioned get_local_type_info. Added a mention of function_record_ptr_from_PyObject.

oremanj · 2025-09-17T19:04:14Z

include/pybind11/detail/class.h

+                local_internals.registered_types_cpp.erase(tinfo->cpptype);
            } else {
                internals.registered_types_cpp.erase(tindex);
+                local_internals.global_registered_types_cpp_fast.erase(tinfo->cpptype);


This will only erase the fast map entry for the DSO where the type was bound. Other DSOs will preserve their cached entries in the fast map, which will now dangle. Solving that with an ABI break is pretty easy - make a linked list of all the fast maps and have modifications to the slow map invalidate the corresponding entries in the fast maps. (Or since you're taking an ABI break, just put the fast map in global internals.) Solving it without an ABI break will be difficult. The best way I can think of is to accompany each entry you add to the fast map with a keep_alive that will remove it when the targeted type object is being destroyed. That's pretty heavyweight though.

Thanks for finding the hole in this strategy! I wonder if there's a way to accommodate developing things in anticipation of a future ABI break without just doubling CI costs.

without just doubling CI costs.

We used to have a few extra jobs that set PYBIND11_INTERNALS_VERSION=10000000 or similar.

Try git show 662a88cbc16be6ad1698ce73e7a30982119f2fca and look for -DPYBIND11_INTERNALS_VERSION=10000000.

We could do something surgical, like picking out one or a couple jobs each for Linux, macOS, Windows to override the internals version.

oremanj

FYI, wjakob/nanobind#1140 incidentally contains some rework of the nanobind fast type map in case you want to incorporate ideas from that. One fast map per thread (to avoid lock contention) + caching negative lookups. Note that caching negative lookups requires also remembering which type_info pointers you've seen for a given C++ type, so you can invalidate the cache when a new type is bound. (AFAICT, nothing you're doing here precludes getting fancier in that direction later.)

rwgk · 2025-09-22T19:12:33Z

I think this is fine, assuming we also add some extra test coverage.

@b-pass for visibility — This PR needs a git merge master before a final review though.

include/pybind11/detail/internals.h

b-pass

Looks good to me

swolchok · 2025-09-26T04:27:23Z

Note that the "real" percentage improvement is larger than implied by the above because master does not yet include #5824.

Updated benchmark runs:

master, Mac: 58.6, 58.8, 58.6 ns/op
this PR, Mac: 56.0, 56.9, 56.1 ns/op
master, Linux server: 158, 158, 159 ns/op
this PR, Linux server: 146, 144, 154 ns/op

By the way, were there any plans to adopt a faster map implementation such as the one used by nanobind? during an ABI break might be a good time.

rwgk · 2025-09-27T17:01:50Z

include/pybind11/detail/type_caster_base.h

-    auto it = locals.find(tp);
+inline detail::type_info *get_local_type_info(const std::type_info &tp,
+                                              const local_internals &local_internals) {
+    const auto &locals = local_internals.registered_types_cpp;


@swolchok Could you please help with general code health by creating an issue pointing to the two missing with_internals you guys have discovered here? We can then tag free-threading folks to bring that to their attention. Please tag me when you create the issue (I don't systematically look at new issues).

rwgk · 2025-09-27T17:33:22Z

include/pybind11/detail/internals.h

 #ifndef PYBIND11_INTERNALS_VERSION
-//   REMINDER for next version bump: remove loader_life_support_tls
-#    define PYBIND11_INTERNALS_VERSION 11
+#    if PY_VERSION_HEX >= 0x030E0000


Sorry I just realized this a bad idea and we shouldn't do this (although we've actually done this before and I'm not aware of complaints):

E.g. if someone builds extensions with Python 3.14 and pybind11 v3.0.0 or v3.0.1 (internals version 11), then other extensions with Python 3.14 and pybind11>3.0.1 (assuming internals version 12), they will not fully interoperate.

Preemptively bumping the internals version for Python 3.14 would be OK only if v3.0.0 or v3.0.1 didn't work with Python 3.14, but I believe they do (although I'm not sure, but I don't have time to find out for sure).

To avoid surprising users, could you please undo the PY_VERSION_HEX >= 0x030E0000 change here?

We need to make a couple surgical changes to ci.yml to get test coverage. See my older comment.

OK. I took a guess and used the inplace build for the smoke test

rwgk · 2025-09-27T17:44:47Z

include/pybind11/detail/internals.h

+// REVIEW: do we need to add a fancy hash for pointers or is the
+// possible identity hash function from the standard library (e.g.,
+// libstdc++) sufficient?


Just to ack that I've seen this, but I don't know.

It'd be good to rewrite the comment before merging if nobody knows.

rwgk · 2025-09-27T17:52:29Z

include/pybind11/detail/type_caster_base.h

 }

-inline detail::type_info *get_global_type_info(const std::type_index &tp) {
+inline detail::type_info *get_global_type_info(const std::type_info &tp) {


Could you please add a comment — somewhere between terse and concise — to explain that this function implements a two-level lookup? Maybe end with "For more details see PR #5842."

(I'd add the comment inside the function, at the top.)

Ideally make it play nicely with the comment you're adding in internals.h:

#if PYBIND11_INTERNALS_VERSION >= 12 // non-normative but fast "hint" for // registered_types_cpp. Successful lookups are correct; // unsuccessful lookups need to try registered_types_cpp and then // backfill this map if they find anything. fast_type_map<type_info *> registered_types_cpp_fast; #endif

Mentioning "two-level lookup" there, too, would be nice.

… comment, add two-level lookup comments. ci.yml coming separately

…y on a few platforms.

rwgk · 2025-10-05T18:06:19Z

@swolchok I added commit 08c09ae, to not under-sell the testing. I looked at the "log archive" for https://github.com/pybind/pybind11/actions/runs/18144031622, it's definitely full testing under Linux, macOS, Windows; I believe that alone will catch >90% of any oversights that could sneak in.

oremanj

Sorry for the post-commit review, but I noticed a couple other things you might want to account for - one very minor detail and one more substantial correctness issue.

oremanj · 2025-10-11T02:08:06Z

include/pybind11/detail/class.h

            auto tindex = std::type_index(*tinfo->cpptype);
            internals.direct_conversions.erase(tindex);

+            auto &local_internals = get_local_internals();


Here and in the similar logic for registration, you're fetching the local internals even if you don't need them. Possibly a regression due to the history where the fast type map was in local internals even for non-local types?

oremanj · 2025-10-11T02:11:56Z

include/pybind11/detail/type_caster_base.h

+        auto it = types.find(std::type_index(tp));
        if (it != types.end()) {
+#if PYBIND11_INTERNALS_VERSION >= 12
+            fast_types.emplace(&tp, it->second);


How are you going to cause this entry to be cleaned up when the type it points to is destroyed? Currently I think it will just dangle, which won't go well when you try to use the result.

nanobind solves this problem by having its equivalent of detail::type_info store a list of all the distinct std::type_info pointers that have been seen for the same type. This list is empty except in cross-DSO situations, which are relatively rare, so it's not very expensive to store. Destruction of the type iterates over the list and removes each alternate pointer from the fast type map.

I'm not sure I follow. Why does the fast map have different cleanup timing requirements than the slow map? We clean up both at the same time.

The slow map has one entry per C++ type. The fast map potentially has multiple. You’re cleaning up the fast map entry that was added by the module that defined the binding, but not the ones that may have been added by other DSOs (with different typeinfo pointers for that type) when they looked it up. I left this comment on the location in the code where those secondary entries are being added; I don’t see anywhere where they can be removed.

I don’t see anywhere where they can be removed.

Naive solution: when a type is deallocated, can't we just loop over the whole map and remove all entries with the same value as the one we want to remove?
Obvious problem: this would cause destruction of N types to take O(N^2) time in all cases.

The nanobind solution seems fine, but the first thing that needs to be done is writing a test that will expose this problem, since we clearly don't have one (or it's not a problem for some reason).

not 100% clear on how to write a test that will result in classes being destroyed, but it looks like test_class does it by registering into temporary modules so I'll try that. I think we want something like:

DSO1 registers a type, such as by py::class_

DSO2 looks up the same type (as defined by type name matching), such as by returning an instance of that type from a pybound function

DSO1 goes away

repeat (2)

theory is that currently we will see a crash; before this PR it should've worked?

Your summary looks good to me, except that Python can't really unload a whole DSO. Instead, you can del module.TheType from Python. As long as there are no other references to the type (such as via its instances), it will drop to zero references and be deallocated. Note that types are almost always in cycles, so you'll need a gc.collect() or two.

You can steal this function from the tests for #5800:

def delattr_and_ensure_destroyed(*specs): wrs = [] for mod, name in specs: wrs.append(weakref.ref(getattr(mod, name))) delattr(mod, name) for _ in range(5): gc.collect() if all(wr() is None for wr in wrs): break else: pytest.fail( f"Could not delete bindings such as {next(wr for wr in wrs if wr() is not None)!r}" )

After the type has been destroyed, you can bind it again with a new py::class_ call.

Be aware that extension types are immortal on non-CPython VMs (PyPy, GraalPy) and all types are immortal on free-threading CPython 3.13 (3.14+ have proper GC for them), so you'll want to disable the test for those platforms.

OK. I will try to get the tests + fix up today.

@oremanj

…ind#5842 @oremanj pointed out in a comment on pybind#5842 that I missed part of the nanobind PR I was porting in such a way that we could have dangling pointers in internals::registered_types_cpp_fast. This PR adds a test that reproed the bug and then fixes the test.

@oremanj

… (#5867) * Fix dangling pointer in internals::registered_types_cpp_fast from #5842 @oremanj pointed out in a comment on #5842 that I missed part of the nanobind PR I was porting in such a way that we could have dangling pointers in internals::registered_types_cpp_fast. This PR adds a test that reproed the bug and then fixes the test. * review feedback, attempt to fix -Werror in CI * use const ref, skip test on python 3.13 free-threaded * Skip test on 3.13t more robustly * style: pre-commit fixes * CI fix --------- Co-authored-by: Joshua Oreman <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

swolchok commented Sep 17, 2025

View reviewed changes

include/pybind11/detail/internals.h Outdated Show resolved Hide resolved

swolchok force-pushed the fast-local-maps branch from 7e11f11 to cc0b053 Compare September 17, 2025 17:10

oremanj reviewed Sep 17, 2025

View reviewed changes

simplify unsafe_reset_local_internals in test

1cd2b55

oremanj reviewed Sep 17, 2025

View reviewed changes

pre-implement PYBIND11_INTERNALS_VERSION 12

abf237f

swolchok changed the title ~~Add fast_type_map, use it authoritatively for local types and as a hint for global types~~ Add fast_type_map, use it authoritatively for local types and as a hint for global types (ABI breaking) Sep 22, 2025

rwgk reviewed Sep 22, 2025

View reviewed changes

include/pybind11/detail/internals.h Show resolved Hide resolved

b-pass approved these changes Sep 22, 2025

View reviewed changes

swolchok added 2 commits September 25, 2025 21:15

Merge branch 'master' into fast-local-maps

b87e762

use PYBIND11_INTERNALS_VERSION 12 on Python 3.14 per suggestion

c333e85

rwgk reviewed Sep 27, 2025

View reviewed changes

Merge branch 'master' into fast-local-maps

8651580

rwgk mentioned this pull request Sep 30, 2025

[BUG]: local_internals appear very thread-unsafe #5799

Closed

3 tasks

swolchok added 2 commits September 30, 2025 14:24

Implement reviewer comments: revert PY_VERSION_HEX change, fix REVIEW…

3ca29e8

… comment, add two-level lookup comments. ci.yml coming separately

Use the inplace build to smoke test ABI bump?

48f831b

swolchok requested a review from henryiii as a code owner September 30, 2025 21:30

[skip ci] Remove "smoke" from comment. This is full testing, just onl…

08c09ae

…y on a few platforms.

rwgk approved these changes Oct 5, 2025

View reviewed changes

rwgk merged commit 3262000 into pybind:master Oct 5, 2025
3 checks passed

github-actions bot added the needs changelog Possibly needs a changelog entry label Oct 5, 2025

oremanj reviewed Oct 11, 2025

View reviewed changes

swolchok mentioned this pull request Oct 14, 2025

Fix dangling pointer in internals::registered_types_cpp_fast from #5842 #5867

Merged

Uh oh!

Add fast_type_map, use it authoritatively for local types and as a hint for global types (ABI breaking) #5842

Add fast_type_map, use it authoritatively for local types and as a hint for global types (ABI breaking) #5842

Uh oh!

Conversation

swolchok commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Suggested changelog entry:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oremanj left a comment

Choose a reason for hiding this comment

Uh oh!

rwgk commented Sep 22, 2025

Uh oh!

Uh oh!

b-pass left a comment

Choose a reason for hiding this comment

Uh oh!

swolchok commented Sep 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rwgk commented Oct 5, 2025

Uh oh!

Uh oh!

oremanj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swolchok Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swolchok commented Sep 17, 2025 •

edited

Loading

swolchok Oct 11, 2025 •

edited

Loading