Replace internal maps with unsorted Vecs #192

hanna-kruppe · 2025-11-20T21:00:19Z

HashMap and BTreeMap are overkill in this context. Unsorted vectors are plenty fast enough and the necessary collection interfaces are straightforward to implement. This change has two benefits.

First, it improves binary size. For the print example from signal-hook in release mode, the .text section shrinks by about 18 KiB and overall file size shrinks by about 30 KiB. That's roughly a 6% reduction in both metrics.

Second, the simpler data structures make it more obvious that the signal handler only does async-signal-safe operations. In particular, the default HashMap has a RandomState, which can access TLS, do dlsym lookups, open and read from files, etc. depending on the platform. I don't think that's a problem for the hash table lookup done in the signal handler since that shouldn't touch the RandomState, but it's a bit subtle and the standard library doesn't make any guarantees about this. Avoiding hash maps entirely removes the need to think about it.

Performance notes:

(Un-)registering actions does an insert/remove by ActionId, which is asymptotically slower with this PR. However, (un-)registering is a slow operation and should be done rarely. Besides locking, it always clones the entire SignalData, so it already takes O(n) time when there's n actions registered across all symbols.
The signal handler looks up the Slot by signal number, which is asymptotically slower with this PR. However, there's only a very small constant number of signals, so asymptotics don't matter.
After looking up the right Slot, the signal handler only iterates sequentially over the actions, it doesn't do any lookups by ActionId.
For a simple microbenchmark that registers one action each for 20 signals and then raises one signal 100k times, this implementation appears to be slightly faster regardless of which signal is raised.

hanna-kruppe · 2025-11-20T21:03:53Z

Note: despite the claimed advantages, I would 100% understand if this was rejected to avoid having to review and maintain the hand-rolled map implementation. But I was curious how much it help w.r.t. binary size, and once I had implemented it, I figured I might as well write a proper commit message and submit a PR in case it's of interest.

serial_lock and its dependency tree make it hard to keep tests working on Rust 1.40. In particular, all early versions of futures-util 0.3 with sufficiently low MSRV were yanked.

For some reason, the tool reports a "not found" error for these lib.rs links, but docs.rs links are fine.

vorner · 2025-11-28T13:19:18Z

Looking at it, it looks in a beneficial direction. Though I'd still have few suggestions, if I may:

Do we need a full container-like API or are some of the methods unnecessary? The Entry API seems a bit heavy here (I know it's used, but maybe we can get away without?), I'm not sure if get_mut is necessary.
It would make sense to have at least few tests for the container.
I wonder if it would make sense to keep the keys sorted and use binary search for lookup.

hanna-kruppe · 2025-11-29T19:12:56Z

Do we need a full container-like API or are some of the methods unnecessary? The Entry API seems a bit heavy here (I know it's used, but maybe we can get away without?), I'm not sure if get_mut is necessary.

I started out by modifying the places using collection APIs to open-code Vec wrangling, which turned into a big mess halfway though. It's quite possible that I over-corrected in the other direction! I'll take another look at the uses of Entry and get_mut.

I can add some basic tests once the API surface is settled.

I wonder if it would make sense to keep the keys sorted and use binary search for lookup.

That was actually my first instinct, but my initial attempt at it became too hairy halfway through (not only due to open-coding). Then I took a step back and came to the conclusion that it's an unnecessary complication for no performance gain (see analysis in the commit message). But maybe the scales tip the other way if the API surface is trimmed down. On the other hand, if the entry API goes away, there's less need for the index-based find helper function to work around borrow checker limitations... I'll think about it again.

This doesn't remove the `-A` parameter in the CI workflow, that's still needed to override the blanket `-D clippy::all`. But it fixes the false positive from normal `cargo clippy` invocations.

HashMap and BTreeMap are overkill in this context. Unsorted vectors are plenty fast enough and the necessary collection interfaces are straightforward to implement. This change has two benefits. First, it improves binary size. For the `print` example from signal-hook in release mode, the .text section shrinks by about 18 KiB and overall file size shrinks by about 30 KiB. That's roughly a 6% reduction in both metrics. Second, the simpler data structures make it more obvious that the signal handler only does async-signal-safe operations. In particular, the default HashMap has a `RandomState`, which can access TLS, do dlsym lookups, open and read from files, etc. depending on the platform. I don't think that's a problem for the hash table *lookup* done in the signal handler since that shouldn't touch the `RandomState`, but it's a bit subtle and the standard library doesn't make any guarantees about this. Avoiding hash maps entirely removes the need to think about it. Performance notes: * (Un-)registering actions does an insert/remove by ActionId, which is asymptotically slower with this PR. However, (un-)registering is a slow operation and should be done rarely. Besides locking, it always clones the entire `SignalData`, so it already takes O(n) time when there's n actions registered across all symbols. * The signal handler looks up the `Slot` by signal number, which is asymptotically slower with this PR. However, there's only a very small constant number of signals, so asymptotics don't matter. * After looking up the right `Slot`, the signal handler only iterates sequentially over the actions, it doesn't do any lookups by ActionId. * For a simple microbenchmark that registers one action each for 20 signals and then raises one signal 100k times, this implementation appears to be slightly *faster* regardless of which signal is raised.

hanna-kruppe · 2025-11-30T12:47:04Z

Updates:

Rebased on Various CI fixes #193 for now (got annoyed by clippy warnings while developing) and fixed a stray BTreeMap mention in cfg(windows) code that CI caught
Got rid of the entry API, was only used in one place and the borrowck-friendly "check if key is present, get_mut().unwrap() if yes" dance is not too bad there.
I tried the "sorted Vec" approach in the last commit. With separate vectors for keys and values, it's tolerable because <[T]>::binary_search works as-is. But it doesn't seem clearly better either, and technically the invariants are more involved. What do you think?

hanna-kruppe force-pushed the dumb-down-data-structures branch from ff75628 to 1a3150b Compare November 20, 2025 21:11

hanna-kruppe changed the title ~~Replace internal maps with unsorted Vec~~ Replace internal maps with unsorted Vecs Nov 20, 2025

hanna-kruppe added 8 commits November 23, 2025 17:25

replace serial_test with a plain lock

60423a1

serial_lock and its dependency tree make it hard to keep tests working on Rust 1.40. In particular, all early versions of futures-util 0.3 with sufficiently low MSRV were yanked.

fix clippy warnings

7695eda

fix coverage workflow

db05fc0

update audit workflow, ignore irrelevant advisory

6ae11a6

various updates and fixes for test CI workflow

f333f24

appease cargo-deadlinks

2c3b402

For some reason, the tool reports a "not found" error for these lib.rs links, but docs.rs links are fine.

actually remove broken targets

ad897bd

actually run ci-check.sh on windows

1a94e10

hanna-kruppe added 5 commits November 29, 2025 16:50

siginfo module doesn't work on windows

daa8dc0

fix windows-specific errors/warnings

7c8435b

defuse the print example on windows

e6c26ff

defuse doctests that won't work on windows

81b5bfd

also fix fn-cast-as-integer warning for windows

756bbdd

hanna-kruppe added 2 commits November 30, 2025 11:09

also allow unnecessary_clippy_cfg in Cargo.toml

1101121

This doesn't remove the `-A` parameter in the CI workflow, that's still needed to override the blanket `-D clippy::all`. But it fixes the false positive from normal `cargo clippy` invocations.

delete a now-unused CI file

f6a6be4

hanna-kruppe force-pushed the dumb-down-data-structures branch from 1a3150b to 97356b0 Compare November 30, 2025 12:39

hanna-kruppe added 3 commits November 30, 2025 13:45

remove the entry API subset

87e44d1

try out sorted vec

4d1be65

hanna-kruppe force-pushed the dumb-down-data-structures branch from 97356b0 to 4d1be65 Compare November 30, 2025 12:45

hanna-kruppe marked this pull request as draft November 30, 2025 12:49

hanna-kruppe marked this pull request as ready for review November 30, 2025 12:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace internal maps with unsorted Vecs #192

Replace internal maps with unsorted Vecs #192

hanna-kruppe commented Nov 20, 2025

Uh oh!

hanna-kruppe commented Nov 20, 2025

Uh oh!

vorner commented Nov 28, 2025

Uh oh!

hanna-kruppe commented Nov 29, 2025

Uh oh!

hanna-kruppe commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Replace internal maps with unsorted Vecs #192

Are you sure you want to change the base?

Replace internal maps with unsorted Vecs #192

Conversation

hanna-kruppe commented Nov 20, 2025

Uh oh!

hanna-kruppe commented Nov 20, 2025

Uh oh!

vorner commented Nov 28, 2025

Uh oh!

hanna-kruppe commented Nov 29, 2025

Uh oh!

hanna-kruppe commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants