Skip to content

Conversation

@Kijewski
Copy link
Member

@Kijewski Kijewski commented Jun 24, 2025

This change cuts 40 to 150 instructions per parsing function that generate an ErrorContext, and reduces the stack usage per function by 500 or more bytes.

Also, it's less code you have to read and type per error message.

Benchmark results (benches/from_str.rs)
librustdoc/all          time:   [177.24 µs 177.92 µs 178.56 µs]
                        thrpt:  [79.081 MiB/s 79.369 MiB/s 79.674 MiB/s]
                 change:
                        time:   [−2.3057% −2.0764% −1.8236%] (p = 0.00 < 0.05)
                        thrpt:  [+1.8575% +2.1204% +2.3601%]
                        Performance has improved.

librustdoc/item_info    time:   [3.1792 µs 3.1890 µs 3.1973 µs]
                        thrpt:  [49.216 MiB/s 49.344 MiB/s 49.495 MiB/s]
                 change:
                        time:   [−4.3188% −3.9734% −3.6065%] (p = 0.00 < 0.05)
                        thrpt:  [+3.7414% +4.1378% +4.5137%]
                        Performance has improved.

librustdoc/item_union   time:   [18.237 µs 18.297 µs 18.369 µs]
                        thrpt:  [53.734 MiB/s 53.945 MiB/s 54.123 MiB/s]
                 change:
                        time:   [−3.6142% −3.2857% −2.9748%] (p = 0.00 < 0.05)
                        thrpt:  [+3.0660% +3.3974% +3.7497%]
                        Performance has improved.

librustdoc/page         time:   [83.761 µs 84.102 µs 84.414 µs]
                        thrpt:  [73.355 MiB/s 73.627 MiB/s 73.927 MiB/s]
                 change:
                        time:   [−2.4929% −2.2196% −1.9439%] (p = 0.00 < 0.05)
                        thrpt:  [+1.9825% +2.2700% +2.5566%]
                        Performance has improved.

librustdoc/print_item   time:   [9.5466 µs 9.5528 µs 9.5603 µs]
                        thrpt:  [98.756 MiB/s 98.834 MiB/s 98.898 MiB/s]
                 change:
                        time:   [−1.4294% −1.2085% −0.9713%] (p = 0.00 < 0.05)
                        thrpt:  [+0.9808% +1.2233% +1.4501%]
                        Change within noise threshold.

librustdoc/short_item_info
                        time:   [8.8031 µs 8.8127 µs 8.8204 µs]
                        thrpt:  [102.72 MiB/s 102.80 MiB/s 102.92 MiB/s]
                 change:
                        time:   [−4.3207% −3.9712% −3.6035%] (p = 0.00 < 0.05)
                        thrpt:  [+3.7382% +4.1355% +4.5158%]
                        Performance has improved.

librustdoc/sidebar      time:   [20.181 µs 20.218 µs 20.253 µs]
                        thrpt:  [60.933 MiB/s 61.036 MiB/s 61.150 MiB/s]
                 change:
                        time:   [−2.5827% −2.3381% −2.0839%] (p = 0.00 < 0.05)
                        thrpt:  [+2.1283% +2.3940% +2.6512%]
                        Performance has improved.

librustdoc/source       time:   [7.2338 µs 7.2377 µs 7.2423 µs]
                        thrpt:  [101.79 MiB/s 101.85 MiB/s 101.91 MiB/s]
                 change:
                        time:   [−3.1453% −2.8930% −2.6317%] (p = 0.00 < 0.05)
                        thrpt:  [+2.7028% +2.9792% +3.2475%]
                        Performance has improved.

librustdoc/type_layout_size
                        time:   [4.3755 µs 4.3774 µs 4.3792 µs]
                        thrpt:  [61.847 MiB/s 61.874 MiB/s 61.900 MiB/s]
                 change:
                        time:   [−6.6822% −6.5162% −6.3593%] (p = 0.00 < 0.05)
                        thrpt:  [+6.7912% +6.9704% +7.1607%]
                        Performance has improved.

librustdoc/type_layout  time:   [14.297 µs 14.316 µs 14.331 µs]
                        thrpt:  [187.86 MiB/s 188.06 MiB/s 188.30 MiB/s]
                 change:
                        time:   [−3.2479% −2.8916% −2.6244%] (p = 0.00 < 0.05)
                        thrpt:  [+2.6952% +2.9777% +3.3569%]
                        Performance has improved.

Kijewski added 2 commits June 24, 2025 17:37
This change cuts 40 to 150 instructions per parsing function that
generates a `ErrorContext`, and reduces the stack usage per function 500
or more bytes.

Also, it's less code you have to read and type per error message.

<details>

<summary>Benchmark results (benches/from_str.rs)</summary>

```text
librustdoc/all          time:   [177.24 µs 177.92 µs 178.56 µs]
                        thrpt:  [79.081 MiB/s 79.369 MiB/s 79.674 MiB/s]
                 change:
                        time:   [−2.3057% −2.0764% −1.8236%] (p = 0.00 < 0.05)
                        thrpt:  [+1.8575% +2.1204% +2.3601%]
                        Performance has improved.

librustdoc/item_info    time:   [3.1792 µs 3.1890 µs 3.1973 µs]
                        thrpt:  [49.216 MiB/s 49.344 MiB/s 49.495 MiB/s]
                 change:
                        time:   [−4.3188% −3.9734% −3.6065%] (p = 0.00 < 0.05)
                        thrpt:  [+3.7414% +4.1378% +4.5137%]
                        Performance has improved.

librustdoc/item_union   time:   [18.237 µs 18.297 µs 18.369 µs]
                        thrpt:  [53.734 MiB/s 53.945 MiB/s 54.123 MiB/s]
                 change:
                        time:   [−3.6142% −3.2857% −2.9748%] (p = 0.00 < 0.05)
                        thrpt:  [+3.0660% +3.3974% +3.7497%]
                        Performance has improved.

librustdoc/page         time:   [83.761 µs 84.102 µs 84.414 µs]
                        thrpt:  [73.355 MiB/s 73.627 MiB/s 73.927 MiB/s]
                 change:
                        time:   [−2.4929% −2.2196% −1.9439%] (p = 0.00 < 0.05)
                        thrpt:  [+1.9825% +2.2700% +2.5566%]
                        Performance has improved.

librustdoc/print_item   time:   [9.5466 µs 9.5528 µs 9.5603 µs]
                        thrpt:  [98.756 MiB/s 98.834 MiB/s 98.898 MiB/s]
                 change:
                        time:   [−1.4294% −1.2085% −0.9713%] (p = 0.00 < 0.05)
                        thrpt:  [+0.9808% +1.2233% +1.4501%]
                        Change within noise threshold.

librustdoc/short_item_info
                        time:   [8.8031 µs 8.8127 µs 8.8204 µs]
                        thrpt:  [102.72 MiB/s 102.80 MiB/s 102.92 MiB/s]
                 change:
                        time:   [−4.3207% −3.9712% −3.6035%] (p = 0.00 < 0.05)
                        thrpt:  [+3.7382% +4.1355% +4.5158%]
                        Performance has improved.

librustdoc/sidebar      time:   [20.181 µs 20.218 µs 20.253 µs]
                        thrpt:  [60.933 MiB/s 61.036 MiB/s 61.150 MiB/s]
                 change:
                        time:   [−2.5827% −2.3381% −2.0839%] (p = 0.00 < 0.05)
                        thrpt:  [+2.1283% +2.3940% +2.6512%]
                        Performance has improved.

librustdoc/source       time:   [7.2338 µs 7.2377 µs 7.2423 µs]
                        thrpt:  [101.79 MiB/s 101.85 MiB/s 101.91 MiB/s]
                 change:
                        time:   [−3.1453% −2.8930% −2.6317%] (p = 0.00 < 0.05)
                        thrpt:  [+2.7028% +2.9792% +3.2475%]
                        Performance has improved.

librustdoc/type_layout_size
                        time:   [4.3755 µs 4.3774 µs 4.3792 µs]
                        thrpt:  [61.847 MiB/s 61.874 MiB/s 61.900 MiB/s]
                 change:
                        time:   [−6.6822% −6.5162% −6.3593%] (p = 0.00 < 0.05)
                        thrpt:  [+6.7912% +6.9704% +7.1607%]
                        Performance has improved.

librustdoc/type_layout  time:   [14.297 µs 14.316 µs 14.331 µs]
                        thrpt:  [187.86 MiB/s 188.06 MiB/s 188.30 MiB/s]
                 change:
                        time:   [−3.2479% −2.8916% −2.6244%] (p = 0.00 < 0.05)
                        thrpt:  [+2.6952% +2.9777% +3.3569%]
                        Performance has improved.
```

</details>
@Kijewski Kijewski added the parser Related to the parser label Jun 24, 2025
@GuillaumeGomez
Copy link
Collaborator

Inlining and its random performance changes. 😆 Sorry, laughing because that's what I spent the last days working on: rust-lang/literal-escaper#16

We had different performance changes based on the architecture. Super weird.

@GuillaumeGomez GuillaumeGomez merged commit 73bbfa9 into askama-rs:master Jun 24, 2025
42 checks passed
@GuillaumeGomez
Copy link
Collaborator

Thanks for the PR, very good catch!

@Kijewski Kijewski deleted the pr-context_error branch June 24, 2025 21:39
@Kijewski
Copy link
Member Author

In one project everything got ~10% slower when I added zerocopy to the project. Without even using it. There mere line use zerocopy as _; made everything slow. Maybe there is an AI embedded in LLVM without anyone knowing it, and that intelligence hates us humans. :D

Error messages are only produced when the error is not recoverable. I would have no reservations making the error path 50% slower if that makes the successful path 5% faster. But I guess the compiler cannot make this decision on its own.

Kijewski added a commit to Kijewski/askama that referenced this pull request Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parser Related to the parser

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants