Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize CodedBufferWriter._copyInto to memcpy #885

Merged
merged 1 commit into from
Oct 23, 2023

Conversation

osa1
Copy link
Member

@osa1 osa1 commented Oct 23, 2023

dart2wasm currently can't optimize loops into memcpy, however setRange methods have type tests to generate array.copy (Wasm's memcpy).

Replacing the loops in CodedBufferWriter._copyInto with setRange improves an internal benchmark extracted from a real use case significantly in all targets:

Before After Diff
AOT 127,587 us 95,634 us -31,953 us, -25.0%
JIT 106,880 us 92,800 us -14,080 us, -13.1%
dart2js -O4 285,587 us 262,222 us -23,365 us, -8.1%
dart2wasm --omit-type-checks 337,000 us 236,100 us -100,900 us, -29.9%

cl/575887404

dart2wasm currently can't optimize loops into memcpy, however `setRange`
methods have type tests to generate `array.copy` (Wasm's `memcpy`).

Replacing the loops in `CodedBufferWriter._copyInto` with `setRange`
improves an internal benchmark extracted from a real use case
significantly in all targets:

|                              | Before     | After      | Diff                |
|------------------------------|------------|------------|---------------------|
| AOT                          | 127,587 us | 95,634 us  | -31,953 us,  -25.0% |
| JIT                          | 106,880 us | 92,800 us  | -14,080 us,  -13.1% |
| dart2js -O4                  | 285,587 us | 262,222 us | -23,365 us,  -8.1%  |
| dart2wasm --omit-type-checks | 337,000 us | 236,100 us | -100,900 us, -29.9% |
@osa1 osa1 requested a review from mkustermann October 23, 2023 08:37
Copy link
Collaborator

@mkustermann mkustermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice \o/

@osa1 osa1 merged commit 050c162 into google:master Oct 23, 2023
@osa1 osa1 deleted the osa1/optimizeCopyInto branch October 23, 2023 08:51
osa1 added a commit to osa1/protobuf.dart that referenced this pull request Oct 24, 2023
By avoiding allocating empty `Uint8List`s this saves a few percent in
dart2js -O4 in the benchmark reported in google#885.

|                              | Before     | After      | Diff                |
|------------------------------|------------|------------|---------------------|
| AOT                          | 122,917 us | 122,741 us |                     |
| JIT                          |  93,376 us |  94,880 us |                     |
| dart2js -O4                  | 271,111 us | 258,250 us | -12,861 us, -4.7%   |
| dart2wasm --omit-type-checks | 196,454 us | 195,300 us |                     |

Number of splices in the benchmark before this change: 21,123
After: 20,857
osa1 added a commit to osa1/protobuf.dart that referenced this pull request Oct 24, 2023
…edBufferWriter`

Similar to google#885, this optimizes some more buffer copying to `memcpy`.

Becuase the chunks can be very small, we use a simple loop as before on small
chunks and `setRange` on large chunks.

The chunk size to check for when to use a loop is somewhat arbitrarily chosen
as 20. In many benchmarks the chunks are either too small (less than 10 bytes)
or a more than a hundred bytes, so in the benchmarks I've tried it doesn't
matter whether the threshold is 20, 30, or 40.

The slowness in dart2js is probably caused by the `Uint8List` allocation, to be
passed to `setRange` as the source.

Results from the same benchmark in google#885. Before numbers are using the changes
in google#886.

|                              | Before     | After      | Diff                |
|------------------------------|------------|------------|---------------------|
| AOT                          | 122,741 us | 114,582 us | - 8,159 us, -6.6%   |
| JIT                          |  94,880 us |  92,317 us | - 2,483 us, -2.6%   |
| dart2js -O4                  | 258,250 us | 266,000 us | + 7,750 us, +3.0%   |
| dart2wasm --omit-type-checks | 195,300 us | 169,166 us | -26,134 us, -13.3%  |

AOT and JIT tested on x64.
osa1 added a commit that referenced this pull request Oct 24, 2023
…edBufferWriter` (#887)

Similar to #885, this optimizes some more buffer copying to `memcpy`.

Results from the same benchmark in #885:

|                              | Before     | After      | Diff                |
|------------------------------|------------|------------|---------------------|
| AOT                          | 114,713 us | 109,838 us | - 4,875 us, -4.2%   |
| JIT                          |  91,960 us |  92,887 us | +   927 us, +1.0%   |
| dart2js -O4                  | 259,125 us | 257,000 us | - 2,125 us, -0.8%   |
| dart2wasm --omit-type-checks | 196,909 us | 182,333 us | -14,576 us, -7.4%   |

AOT and JIT tested on x64.
osa1 added a commit that referenced this pull request Oct 26, 2023
By avoiding allocating empty `Uint8List`s this saves a few percent in dart2js
-O4 in the benchmark reported in #885.

|                              | Before     | After      | Diff                |
|------------------------------|------------|------------|---------------------|
| AOT                          | 122,917 us | 122,741 us |                     |
| JIT                          |  93,376 us |  94,880 us |                     |
| dart2js -O4                  | 271,111 us | 258,250 us | -12,861 us, -4.7%   |
| dart2wasm --omit-type-checks | 196,454 us | 195,300 us |                     |

Number of splices in the benchmark before this change: 21,123
After: 20,857
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants