Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement speedups with rust v2 #438

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Conversation

carsonburr
Copy link
Contributor

pip install -e . in a virtualenv should build src/markupsafe/_rust_speedups.???.so, assuming you have Rust installed.

python bench.py to run all benchmarks, rust included. Here's the results on my machine:

$ python bench.py

short escape native: Mean +- std dev: 656 ns +- 12 ns
short escape speedups: Mean +- std dev: 417 ns +- 7 ns
short escape rust_speedups: Mean +- std dev: 522 ns +- 15 ns

long escape native: Mean +- std dev: 17.3 us +- 0.2 us
long escape speedups: Mean +- std dev: 7.79 us +- 0.14 us
long escape rust_speedups: Mean +- std dev: 6.71 us +- 0.08 us

short plain native: Mean +- std dev: 505 ns +- 9 ns
short plain speedups: Mean +- std dev: 349 ns +- 5 ns
short plain rust_speedups: Mean +- std dev: 401 ns +- 4 ns

long plain native: Mean +- std dev: 17.2 us +- 0.1 us
long plain speedups: Mean +- std dev: 7.77 us +- 0.10 us
long plain rust_speedups: Mean +- std dev: 6.73 us +- 0.15 us

long suffix native: Mean +- std dev: 134 us +- 1 us
long suffix speedups: Mean +- std dev: 131 us +- 1 us
long suffix rust_speedups: Mean +- std dev: 58.3 us +- 1.2 us

@davidism
Copy link
Member

davidism commented Apr 23, 2024

I'll have to run it on my machine for an exact comparison, but those are some good performance numbers compared to the C for the long benchmarks.

@carsonburr
Copy link
Contributor Author

Simplified the rust speedups to use a lookup table instead of relying on auto-vectorized simd. The advantages of this are that it's not as messy, doesn't use advanced rust features, doesn't use unsafe*, and slightly faster for the workloads in bench.py. I'm sure this still could be simd-accelerated, but not portably.

*potentially eating the conversion cost if the PyString is stored as utf-16 or unicode 32-bit

src/rust/src/lib.rs Outdated Show resolved Hide resolved
debug = true

[dependencies]
pyo3 = "0.22.2"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidism I guess if you want abi3, can just enable the feature:

Suggested change
pyo3 = "0.22.2"
pyo3 = { version = "0.22.2", features = ["abi3"] }

@davidism
Copy link
Member

davidism commented Oct 6, 2024

@davidhewitt thanks for looking at this. In #461 we're setting up wheel builds for 313 and 313t (free threading). Say we were to add the features = ["abi3"] suggestion you made above. How would we deal with the free threading build? Presumably, we'd offer an abi3 wheel, and then also a 313t wheel that's not abi3? How would we configure pyo3 and cibuildwheel to handle both of those?

@davidhewitt
Copy link

For ease of use, currently at the moment if you have the abi3 feature set but you're building on the freethreaded Python we'll ignore it and build for the freethreaded ABI. I understand in 3.14 there's possibility of a new stable abi which supports freethreading, so that might change in future.

I am not super familiar with cibuildwheel configuration 🙈, though I assume that the Rust build would work the same way as building a freethreaded C wheel.

@davidhewitt
Copy link

Though note also that PyO3's freethreaded support is not complete yet / keeping me up at night / should drop soon-ish in our 0.23 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants