-
Notifications
You must be signed in to change notification settings - Fork 84
Replace internal maps with unsorted Vecs #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Replace internal maps with unsorted Vecs #192
Conversation
|
Note: despite the claimed advantages, I would 100% understand if this was rejected to avoid having to review and maintain the hand-rolled map implementation. But I was curious how much it help w.r.t. binary size, and once I had implemented it, I figured I might as well write a proper commit message and submit a PR in case it's of interest. |
ff75628 to
1a3150b
Compare
serial_lock and its dependency tree make it hard to keep tests working on Rust 1.40. In particular, all early versions of futures-util 0.3 with sufficiently low MSRV were yanked.
For some reason, the tool reports a "not found" error for these lib.rs links, but docs.rs links are fine.
|
Looking at it, it looks in a beneficial direction. Though I'd still have few suggestions, if I may:
|
I started out by modifying the places using collection APIs to open-code I can add some basic tests once the API surface is settled.
That was actually my first instinct, but my initial attempt at it became too hairy halfway through (not only due to open-coding). Then I took a step back and came to the conclusion that it's an unnecessary complication for no performance gain (see analysis in the commit message). But maybe the scales tip the other way if the API surface is trimmed down. On the other hand, if the entry API goes away, there's less need for the index-based |
This doesn't remove the `-A` parameter in the CI workflow, that's still needed to override the blanket `-D clippy::all`. But it fixes the false positive from normal `cargo clippy` invocations.
1a3150b to
97356b0
Compare
HashMap and BTreeMap are overkill in this context. Unsorted vectors are plenty fast enough and the necessary collection interfaces are straightforward to implement. This change has two benefits. First, it improves binary size. For the `print` example from signal-hook in release mode, the .text section shrinks by about 18 KiB and overall file size shrinks by about 30 KiB. That's roughly a 6% reduction in both metrics. Second, the simpler data structures make it more obvious that the signal handler only does async-signal-safe operations. In particular, the default HashMap has a `RandomState`, which can access TLS, do dlsym lookups, open and read from files, etc. depending on the platform. I don't think that's a problem for the hash table *lookup* done in the signal handler since that shouldn't touch the `RandomState`, but it's a bit subtle and the standard library doesn't make any guarantees about this. Avoiding hash maps entirely removes the need to think about it. Performance notes: * (Un-)registering actions does an insert/remove by ActionId, which is asymptotically slower with this PR. However, (un-)registering is a slow operation and should be done rarely. Besides locking, it always clones the entire `SignalData`, so it already takes O(n) time when there's n actions registered across all symbols. * The signal handler looks up the `Slot` by signal number, which is asymptotically slower with this PR. However, there's only a very small constant number of signals, so asymptotics don't matter. * After looking up the right `Slot`, the signal handler only iterates sequentially over the actions, it doesn't do any lookups by ActionId. * For a simple microbenchmark that registers one action each for 20 signals and then raises one signal 100k times, this implementation appears to be slightly *faster* regardless of which signal is raised.
97356b0 to
4d1be65
Compare
|
Updates:
|
HashMap and BTreeMap are overkill in this context. Unsorted vectors are plenty fast enough and the necessary collection interfaces are straightforward to implement. This change has two benefits.
First, it improves binary size. For the
printexample from signal-hook in release mode, the .text section shrinks by about 18 KiB and overall file size shrinks by about 30 KiB. That's roughly a 6% reduction in both metrics.Second, the simpler data structures make it more obvious that the signal handler only does async-signal-safe operations. In particular, the default HashMap has a
RandomState, which can access TLS, do dlsym lookups, open and read from files, etc. depending on the platform. I don't think that's a problem for the hash table lookup done in the signal handler since that shouldn't touch theRandomState, but it's a bit subtle and the standard library doesn't make any guarantees about this. Avoiding hash maps entirely removes the need to think about it.Performance notes:
SignalData, so it already takes O(n) time when there's n actions registered across all symbols.Slotby signal number, which is asymptotically slower with this PR. However, there's only a very small constant number of signals, so asymptotics don't matter.Slot, the signal handler only iterates sequentially over the actions, it doesn't do any lookups by ActionId.