Skip to content

Conversation

@hanna-kruppe
Copy link
Contributor

HashMap and BTreeMap are overkill in this context. Unsorted vectors are plenty fast enough and the necessary collection interfaces are straightforward to implement. This change has two benefits.

First, it improves binary size. For the print example from signal-hook in release mode, the .text section shrinks by about 18 KiB and overall file size shrinks by about 30 KiB. That's roughly a 6% reduction in both metrics.

Second, the simpler data structures make it more obvious that the signal handler only does async-signal-safe operations. In particular, the default HashMap has a RandomState, which can access TLS, do dlsym lookups, open and read from files, etc. depending on the platform. I don't think that's a problem for the hash table lookup done in the signal handler since that shouldn't touch the RandomState, but it's a bit subtle and the standard library doesn't make any guarantees about this. Avoiding hash maps entirely removes the need to think about it.

Performance notes:

  • (Un-)registering actions does an insert/remove by ActionId, which is asymptotically slower with this PR. However, (un-)registering is a slow operation and should be done rarely. Besides locking, it always clones the entire SignalData, so it already takes O(n) time when there's n actions registered across all symbols.
  • The signal handler looks up the Slot by signal number, which is asymptotically slower with this PR. However, there's only a very small constant number of signals, so asymptotics don't matter.
  • After looking up the right Slot, the signal handler only iterates sequentially over the actions, it doesn't do any lookups by ActionId.
  • For a simple microbenchmark that registers one action each for 20 signals and then raises one signal 100k times, this implementation appears to be slightly faster regardless of which signal is raised.

@hanna-kruppe
Copy link
Contributor Author

Note: despite the claimed advantages, I would 100% understand if this was rejected to avoid having to review and maintain the hand-rolled map implementation. But I was curious how much it help w.r.t. binary size, and once I had implemented it, I figured I might as well write a proper commit message and submit a PR in case it's of interest.

@hanna-kruppe hanna-kruppe force-pushed the dumb-down-data-structures branch from ff75628 to 1a3150b Compare November 20, 2025 21:11
@hanna-kruppe hanna-kruppe changed the title Replace internal maps with unsorted Vec Replace internal maps with unsorted Vecs Nov 20, 2025
serial_lock and its dependency tree make it hard to keep tests working
on Rust 1.40. In particular, all early versions of futures-util 0.3 with
sufficiently low MSRV were yanked.
For some reason, the tool reports a "not found" error for these lib.rs
links, but docs.rs links are fine.
@vorner
Copy link
Owner

vorner commented Nov 28, 2025

Looking at it, it looks in a beneficial direction. Though I'd still have few suggestions, if I may:

  • Do we need a full container-like API or are some of the methods unnecessary? The Entry API seems a bit heavy here (I know it's used, but maybe we can get away without?), I'm not sure if get_mut is necessary.
  • It would make sense to have at least few tests for the container.
  • I wonder if it would make sense to keep the keys sorted and use binary search for lookup.

@hanna-kruppe
Copy link
Contributor Author

Do we need a full container-like API or are some of the methods unnecessary? The Entry API seems a bit heavy here (I know it's used, but maybe we can get away without?), I'm not sure if get_mut is necessary.

I started out by modifying the places using collection APIs to open-code Vec wrangling, which turned into a big mess halfway though. It's quite possible that I over-corrected in the other direction! I'll take another look at the uses of Entry and get_mut.

I can add some basic tests once the API surface is settled.

I wonder if it would make sense to keep the keys sorted and use binary search for lookup.

That was actually my first instinct, but my initial attempt at it became too hairy halfway through (not only due to open-coding). Then I took a step back and came to the conclusion that it's an unnecessary complication for no performance gain (see analysis in the commit message). But maybe the scales tip the other way if the API surface is trimmed down. On the other hand, if the entry API goes away, there's less need for the index-based find helper function to work around borrow checker limitations... I'll think about it again.

This doesn't remove the `-A` parameter in the CI workflow, that's still
needed to override the blanket `-D clippy::all`. But it fixes the false
positive from normal `cargo clippy` invocations.
@hanna-kruppe hanna-kruppe force-pushed the dumb-down-data-structures branch from 1a3150b to 97356b0 Compare November 30, 2025 12:39
HashMap and BTreeMap are overkill in this context. Unsorted vectors are
plenty fast enough and the necessary collection interfaces are
straightforward to implement. This change has two benefits.

First, it improves binary size. For the `print` example from signal-hook
in release mode, the .text section shrinks by about 18 KiB and overall
file size shrinks by about 30 KiB. That's roughly a 6% reduction in both
metrics.

Second, the simpler data structures make it more obvious that the signal
handler only does async-signal-safe operations. In particular, the
default HashMap has a `RandomState`, which can access TLS, do dlsym
lookups, open and read from files, etc. depending on the platform. I
don't think that's a problem for the hash table *lookup* done in the
signal handler since that shouldn't touch the `RandomState`, but it's a
bit subtle and the standard library doesn't make any guarantees about
this. Avoiding hash maps entirely removes the need to think about it.

Performance notes:

* (Un-)registering actions does an insert/remove by ActionId, which is
  asymptotically slower with this PR. However, (un-)registering is a
  slow operation and should be done rarely. Besides locking, it always
  clones the entire `SignalData`, so it already takes O(n) time when
  there's n actions registered across all symbols.
* The signal handler looks up the `Slot` by signal number, which is
  asymptotically slower with this PR. However, there's only a very small
  constant number of signals, so asymptotics don't matter.
* After looking up the right `Slot`, the signal handler only iterates
  sequentially over the actions, it doesn't do any lookups by ActionId.
* For a simple microbenchmark that registers one action each for 20
  signals and then raises one signal 100k times, this implementation
  appears to be slightly *faster* regardless of which signal is raised.
@hanna-kruppe hanna-kruppe force-pushed the dumb-down-data-structures branch from 97356b0 to 4d1be65 Compare November 30, 2025 12:45
@hanna-kruppe
Copy link
Contributor Author

Updates:

  • Rebased on Various CI fixes #193 for now (got annoyed by clippy warnings while developing) and fixed a stray BTreeMap mention in cfg(windows) code that CI caught
  • Got rid of the entry API, was only used in one place and the borrowck-friendly "check if key is present, get_mut().unwrap() if yes" dance is not too bad there.
  • I tried the "sorted Vec" approach in the last commit. With separate vectors for keys and values, it's tolerable because <[T]>::binary_search works as-is. But it doesn't seem clearly better either, and technically the invariants are more involved. What do you think?

@hanna-kruppe hanna-kruppe marked this pull request as draft November 30, 2025 12:49
@hanna-kruppe hanna-kruppe marked this pull request as ready for review November 30, 2025 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants