Improve performance of enum_ operators by going back to specific implementation #5887
+217
−58
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This improves the performance of
enum_operators by no longer attempting to funnel them all through a generic implementation, which caused additional overhead related to callingint().Benchmark results
using https://github.com/swolchok/pybind11_benchmark/tree/8a6f19d17c362dc2060dd8461b502b98c3226a47 (the current tip of the benchmark-updates branch):
Enum equality comparison
Command:
python -m timeit --setup 'from pybind11_benchmark import MyEnum; x = MyEnum.ONE' 'x != x'Times are nsec/loop
M4 Mac, before: 165, 167, 166, 164, 167
Mac, after: 78.9, 78.9, 79.7, 79.9, 80.5
Enum ordering comparison
Command:
python -m timeit --setup 'from pybind11_benchmark import MyEnum; x = MyEnum.ONE' 'x < x'Mac, before: 170, 168, 168, 171, 168
Mac, after: 79.5, 78.8, 80.8, 81.3, 82.3
(i.e., no difference between
!=and<)Compare to performance of calling a method of a simple pybinded class:
Command:
python -m timeit --setup 'from pybind11_benchmark import MyInt; x = MyInt()' 'x.get()'Mac: 54.6, 54.6, 54.9, 55.3, 55.3
Also compare to performance using a
py::native_enum:Command:
python -m timeit --setup 'from pybind11_benchmark import MyNativeEnum; x = MyNativeEnum.THREE' 'x < x'Mac: 9.12, 9.13, 9.2, 9.21, 9.34
(I note that the above benchmarks do have a tendency toward monotonically increasing times across runs, but that effect seems to be much smaller than the effect of the code changes.)
Code size:
py::arithmeticenum_ before this PR as measured on my Mac by adding an extra enum to the pybind11_benchmark (specifically https://github.com/swolchok/pybind11_benchmark/tree/8a6f19d17c362dc2060dd8461b502b98c3226a47) was a little over 8 KiB of__text, plus some about 1000 bytes of__gcc_except_taband negligible amounts in other sections. After this PR, the marginal cost increases to a little over 17000 bytes of__text, almost 2000 bytes of__gcc_except_tab, and a few hundred bytes in other sections. I believe @Skylion007 previously mentioned that this seemed like a reasonable order of magnitude of marginal cost.__textfell by about 12500 bytes and__gcc_except_tabfell by a little over 2000 bytes, though there were negligible size increases in other sections.Suggested changelog entry:
py::enum_s, thoughpy::native_enumis still much faster.