Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline storage for instance fields and frame locals #44

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

Hirevo
Copy link
Owner

@Hirevo Hirevo commented May 8, 2024

Currently (before this PR), in the bytecode interpreter, instance fields and frame locals are stored within vectors (as Vec<Value>).

This means that to reach them, one needs to first dereference two pointers.

In the example of a class instance (stored as SOMRef<Instance>), one needs to:

  • dereference once to get to the instance struct.
  • dereference again to get to the field value within the vector of fields.

Due to the vector being a separate allocation, it is very likely that this causes cache misses, slowing down the program.

This PR uses raw pointer arithmetic, a bit of unsafe, and coordination with the GC to allocate additional space right besides the class instance (or stack frame) to store the fields (or locals) into, rather than into a separate vector, and uses pointer offsets to implement accesses, hopefully improving locality and therefore improving cache hit rates.

Depends on #33.

@Hirevo Hirevo added C-enhancement Category: Enhancements M-interpreter Module: Interpreter P-medium Priority: Medium C-performance Category: Performance improvements labels May 8, 2024
@Hirevo Hirevo self-assigned this May 8, 2024
@som-rs-benchmarker
Copy link

som-rs-benchmarker bot commented May 8, 2024

Here are the benchmark results for feat/inline-fields (commit: f17df09):

AST interpreter
+-----------------+----------------------------------------+---------------------------+
| Benchmark       | master (base)                          | feat/inline-fields (head) |
+-----------------+----------------------------------------+---------------------------+
| Bounce          | 191.37 ms ± 15.42 (172.72..222.50)     | 1.05x ± 0.10 (0.93..1.09) |
| BubbleSort      | 249.42 ms ± 12.09 (238.96..275.60)     | 0.99x ± 0.05 (0.95..1.01) |
| DeltaBlue       | 162.23 ms ± 17.85 (142.58..192.92)     | 1.10x ± 0.13 (1.01..1.14) |
| Dispatch        | 188.68 ms ± 17.95 (173.00..224.17)     | 1.09x ± 0.12 (0.99..1.16) |
| Fannkuch        | 120.60 ms ± 10.79 (112.11..139.36)     | 0.97x ± 0.11 (0.85..1.03) |
| Fibonacci       | 348.72 ms ± 10.75 (340.14..376.46)     | 0.94x ± 0.07 (0.81..1.01) |
| FieldLoop       | 314.12 ms ± 13.66 (301.09..345.94)     | 1.06x ± 0.05 (1.03..1.08) |
| GraphSearch     | 82.45 ms ± 7.89 (75.89..101.25)        | 1.04x ± 0.10 (0.97..1.09) |
| IntegerLoop     | 327.72 ms ± 37.34 (295.60..422.43)     | 1.07x ± 0.13 (1.01..1.13) |
| JsonSmall       | 184.87 ms ± 4.98 (178.46..194.98)      | 0.96x ± 0.06 (0.87..1.02) |
| List            | 221.42 ms ± 6.70 (211.53..234.01)      | 0.99x ± 0.05 (0.92..1.05) |
| Loop            | 419.24 ms ± 30.02 (386.30..472.87)     | 1.04x ± 0.09 (0.96..1.09) |
| Mandelbrot      | 247.61 ms ± 22.79 (232.95..308.84)     | 0.98x ± 0.10 (0.89..1.03) |
| NBody           | 194.21 ms ± 3.62 (189.57..199.52)      | 0.94x ± 0.07 (0.85..1.01) |
| PageRank        | 278.19 ms ± 11.07 (263.22..302.83)     | 0.99x ± 0.05 (0.93..1.03) |
| Permute         | 304.78 ms ± 22.92 (280.50..360.63)     | 1.08x ± 0.09 (0.98..1.12) |
| Queens          | 237.88 ms ± 30.12 (216.11..319.32)     | 1.05x ± 0.15 (0.88..1.12) |
| QuickSort       | 71.62 ms ± 2.22 (69.61..77.28)         | 0.98x ± 0.12 (0.81..1.07) |
| Recurse         | 271.86 ms ± 15.12 (248.81..301.77)     | 1.05x ± 0.07 (0.98..1.08) |
| Richards        | 3987.78 ms ± 116.71 (3883.22..4193.07) | 1.03x ± 0.04 (0.96..1.05) |
| Sieve           | 410.50 ms ± 22.74 (376.53..457.70)     | 1.04x ± 0.07 (0.97..1.09) |
| Storage         | 80.77 ms ± 1.85 (78.41..84.34)         | 0.93x ± 0.16 (0.67..1.06) |
| Sum             | 173.83 ms ± 22.03 (146.29..208.79)     | 1.11x ± 0.15 (1.03..1.18) |
| Towers          | 308.83 ms ± 23.34 (282.98..359.38)     | 0.99x ± 0.09 (0.88..1.05) |
| TreeSort        | 147.51 ms ± 5.01 (142.78..159.36)      | 0.97x ± 0.08 (0.85..1.04) |
| WhileLoop       | 340.21 ms ± 19.45 (318.81..381.32)     | 0.93x ± 0.07 (0.86..1.02) |
|                 |                                        |                           |
| Average Speedup |               (baseline)               | 1.01x ± 0.02 (0.93..1.11) |
+-----------------+----------------------------------------+---------------------------+

The raw ReBench data files are available for download here: baseline and head

Bytecode interpreter

The raw ReBench data files are available for download here: baseline and head

The benchmarks were run using ReBench v1.2.0
The statistical analysis was done using rebench-tabler v0.1.0

The source code of this benchmark runner is available as a GitHub Gist for more details about the setup

@som-rs-benchmarker
Copy link

som-rs-benchmarker bot commented May 15, 2024

Here are the benchmark results for feat/inline-fields (commit: 659a218):

AST interpreter
+-----------------+----------------------------------------+---------------------------+
| Benchmark       | master (base)                          | feat/inline-fields (head) |
+-----------------+----------------------------------------+---------------------------+
| Bounce          | 193.56 ms ± 6.78 (184.02..203.93)      | 0.89x ± 0.10 (0.78..0.99) |
| BubbleSort      | 274.03 ms ± 14.72 (252.09..293.55)     | 0.94x ± 0.14 (0.69..1.06) |
| DeltaBlue       | 170.21 ms ± 24.76 (149.46..227.88)     | 1.00x ± 0.18 (0.81..1.12) |
| Dispatch        | 190.68 ms ± 13.78 (176.11..217.45)     | 0.97x ± 0.09 (0.87..1.03) |
| Fannkuch        | 132.53 ms ± 18.65 (113.06..166.66)     | 1.05x ± 0.15 (1.00..1.15) |
| Fibonacci       | 388.18 ms ± 21.44 (366.46..426.65)     | 1.05x ± 0.07 (0.97..1.08) |
| FieldLoop       | 346.85 ms ± 23.86 (304.41..378.93)     | 0.98x ± 0.09 (0.84..1.05) |
| GraphSearch     | 81.46 ms ± 3.98 (77.34..88.81)         | 0.98x ± 0.07 (0.90..1.04) |
| IntegerLoop     | 344.56 ms ± 35.82 (304.82..436.43)     | 1.05x ± 0.12 (0.98..1.12) |
| JsonSmall       | 211.13 ms ± 29.43 (189.15..282.89)     | 0.97x ± 0.16 (0.83..1.08) |
| List            | 245.51 ms ± 13.28 (228.22..265.10)     | 1.02x ± 0.09 (0.89..1.12) |
| Loop            | 417.66 ms ± 17.49 (389.47..442.49)     | 0.92x ± 0.08 (0.80..1.05) |
| Mandelbrot      | 272.77 ms ± 18.55 (247.10..304.97)     | 0.97x ± 0.17 (0.67..1.07) |
| NBody           | 229.51 ms ± 21.41 (188.84..268.65)     | 0.91x ± 0.13 (0.77..1.07) |
| PageRank        | 308.44 ms ± 24.98 (278.38..354.15)     | 1.01x ± 0.11 (0.86..1.09) |
| Permute         | 326.18 ms ± 26.02 (281.97..364.30)     | 0.96x ± 0.14 (0.78..1.12) |
| Queens          | 260.12 ms ± 29.42 (225.41..324.86)     | 1.09x ± 0.13 (1.03..1.17) |
| QuickSort       | 74.60 ms ± 5.46 (69.01..86.30)         | 0.93x ± 0.12 (0.78..1.08) |
| Recurse         | 285.19 ms ± 13.45 (262.92..311.85)     | 1.01x ± 0.08 (0.92..1.11) |
| Richards        | 4389.05 ms ± 180.56 (4054.82..4598.70) | 1.04x ± 0.06 (0.98..1.09) |
| Sieve           | 462.54 ms ± 27.60 (423.33..504.71)     | 1.10x ± 0.08 (1.05..1.17) |
| Storage         | 94.77 ms ± 10.98 (85.43..113.54)       | 1.06x ± 0.15 (0.90..1.17) |
| Sum             | 185.48 ms ± 18.92 (157.21..215.67)     | 1.08x ± 0.17 (0.86..1.25) |
| Towers          | 327.16 ms ± 33.24 (297.54..408.98)     | 1.01x ± 0.12 (0.93..1.08) |
| TreeSort        | 168.61 ms ± 19.48 (148.06..219.50)     | 1.03x ± 0.14 (0.89..1.15) |
| WhileLoop       | 406.59 ms ± 34.09 (370.11..458.88)     | 1.09x ± 0.11 (0.97..1.17) |
|                 |                                        |                           |
| Average Speedup |               (baseline)               | 1.01x ± 0.02 (0.89..1.10) |
+-----------------+----------------------------------------+---------------------------+

The raw ReBench data files are available for download here: baseline and head

Bytecode interpreter
+-----------------+---------------------------------------+---------------------------+
| Benchmark       | master (base)                         | feat/inline-fields (head) |
+-----------------+---------------------------------------+---------------------------+
| Bounce          | 77.34 ms ± 3.57 (71.93..83.15)        | 0.63x ± 0.09 (0.48..0.77) |
| BubbleSort      | 103.36 ms ± 3.07 (98.04..108.06)      | 0.60x ± 0.05 (0.54..0.66) |
| DeltaBlue       | 66.80 ms ± 8.67 (57.24..87.60)        | 0.65x ± 0.11 (0.55..0.72) |
| Dispatch        | 77.99 ms ± 2.64 (75.08..81.27)        | 0.59x ± 0.08 (0.46..0.69) |
| Fannkuch        | 49.18 ms ± 3.45 (45.28..57.06)        | 0.65x ± 0.08 (0.53..0.76) |
| Fibonacci       | 143.38 ms ± 7.77 (134.52..155.29)     | 0.74x ± 0.06 (0.66..0.79) |
| FieldLoop       | 166.73 ms ± 13.36 (151.33..191.09)    | 1.02x ± 0.09 (0.98..1.11) |
| GraphSearch     | 34.48 ms ± 4.82 (30.50..45.95)        | 0.57x ± 0.14 (0.40..0.75) |
| IntegerLoop     | 146.47 ms ± 13.85 (133.43..179.00)    | 0.65x ± 0.08 (0.55..0.73) |
| JsonSmall       | 101.89 ms ± 18.32 (80.95..133.50)     | 0.81x ± 0.22 (0.61..1.03) |
| List            | 110.44 ms ± 8.94 (96.51..124.43)      | 0.64x ± 0.07 (0.57..0.70) |
| Loop            | 190.00 ms ± 7.89 (182.64..208.73)     | 0.69x ± 0.06 (0.62..0.76) |
| Mandelbrot      | 116.59 ms ± 11.24 (104.09..138.99)    | 0.85x ± 0.11 (0.75..0.94) |
| NBody           | 84.22 ms ± 6.07 (79.27..99.11)        | 0.67x ± 0.08 (0.58..0.77) |
| PageRank        | 135.32 ms ± 20.33 (113.04..172.63)    | 0.70x ± 0.13 (0.61..0.84) |
| Permute         | 114.25 ms ± 8.34 (106.52..134.45)     | 0.71x ± 0.06 (0.67..0.79) |
| Queens          | 104.35 ms ± 17.33 (88.45..135.14)     | 0.73x ± 0.15 (0.65..0.88) |
| QuickSort       | 28.93 ms ± 0.83 (27.74..30.82)        | 0.71x ± 0.06 (0.64..0.80) |
| Recurse         | 122.14 ms ± 19.56 (109.85..172.76)    | 0.82x ± 0.14 (0.72..0.87) |
| Richards        | 1486.01 ms ± 80.72 (1385.83..1628.36) | 0.57x ± 0.04 (0.54..0.62) |
| Sieve           | 176.72 ms ± 13.92 (160.91..204.20)    | 0.69x ± 0.08 (0.59..0.80) |
| Storage         | 33.96 ms ± 1.56 (31.90..36.95)        | 0.71x ± 0.09 (0.58..0.90) |
| Sum             | 72.34 ms ± 7.30 (65.42..87.66)        | 0.65x ± 0.10 (0.53..0.82) |
| Towers          | 134.51 ms ± 21.65 (113.67..188.97)    | 0.68x ± 0.13 (0.58..0.77) |
| TreeSort        | 55.40 ms ± 8.94 (46.92..72.11)        | 0.69x ± 0.13 (0.61..0.76) |
| WhileLoop       | 193.77 ms ± 23.45 (169.07..231.40)    | 1.07x ± 0.17 (0.86..1.17) |
|                 |                                       |                           |
| Average Speedup |              (baseline)               | 0.71x ± 0.02 (0.57..1.07) |
+-----------------+---------------------------------------+---------------------------+

The raw ReBench data files are available for download here: baseline and head

The benchmarks were run using ReBench v1.2.0
The statistical analysis was done using rebench-tabler v0.1.0

The source code of this benchmark runner is available as a GitHub Gist for more details about the setup

@OctaveLarose
Copy link
Contributor

Current results on our machines: https://rebench.stefan-marr.de/som-rs/compare/aa52b413ea453d496882546092419c839d56633f..a596c147c4b629fa1feaa89be2767744fb34e134

Assuming I didn't mess up the comparison (I don't think I did), it's a big slowdown - though not on FieldLoop for some reason?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: Enhancements C-performance Category: Performance improvements M-interpreter Module: Interpreter P-medium Priority: Medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants