Skip to content

Conversation

@vext01
Copy link
Contributor

@vext01 vext01 commented Aug 4, 2025

FORLOOP (which yklua already recognises) is for numeric looping, e.g.:

for i = 1, 10; do ...

TFORLOOP is used for other loops, e.g.:

for line in io.lines("fasta1000000.txt") do ...

Peformance impact (vs. main, based on 3 invocations of 10 iterations):

 Benchmark                  Datum0 (ms)  Datum1 (ms)  Ratio  Summary
 revcomp/YkLua/default      2458         1850         0.75   24.75% faster
 knucleotide/YkLua/default  1846         1594         0.86   13.66% faster
 Richards/YkLua/100         4471         4147         0.93   7.24% faster
 LuLPeg/YkLua/default       3357         3252         0.97   3.10% faster
 Json/YkLua/100             2549         2489         0.98   2.37% faster
 Queens/YkLua/1000          504          493          0.98   2.24% faster
 HashIds/YkLua/6000         2873         2829         0.98   1.56% faster
 binarytrees/YkLua/15       3993         3979         1.00   0.36% faster
 Storage/YkLua/1000         9815         9791         1.00   0.24% faster
 CD/YkLua/250               8969         8955         1.00   0.16% faster
 Heightmap/YkLua/2000       749          749          1.00   0.01% faster
 fasta/YkLua/500000         841          841          1.00   0.00% slower
 NBody/YkLua/250000         488          488          1.00   0.04% slower
 spectralnorm/YkLua/1000    902          907          1.01   0.51% slower
 Mandelbrot/YkLua/500       129          130          1.01   0.65% slower
 Sieve/YkLua/3000           468          472          1.01   0.84% slower
 Towers/YkLua/600           908          918          1.01   1.12% slower
 BigLoop/YkLua/1000000000   1617         1641         1.02   1.50% slower
 Havlak/YkLua/1500          18926        19245        1.02   1.69% slower
 Permute/YkLua/1000         790          804          1.02   1.75% slower
 fannkuchredux/YkLua/10     1223         1250         1.02   2.24% slower
 DeltaBlue/YkLua/12000      1949         2000         1.03   2.61% slower
 List/YkLua/1500            832          859          1.03   3.33% slower
 Bounce/YkLua/1500          1142         1186         1.04   3.88% slower

FORLOOP (which yklua already recognises) is for numeric looping, e.g.:
```
for i = 1, 10; do ...
```

TFORLOOP is used for other loops, e.g.:
```
for line in io.lines("fasta1000000.txt") do ...
```

Peformance impact (vs. main, based on 3 invocations of 10 iterations):
```
 Benchmark                  Datum0 (ms)  Datum1 (ms)  Ratio  Summary
 revcomp/YkLua/default      2458         1850         0.75   24.75% faster
 knucleotide/YkLua/default  1846         1594         0.86   13.66% faster
 Richards/YkLua/100         4471         4147         0.93   7.24% faster
 LuLPeg/YkLua/default       3357         3252         0.97   3.10% faster
 Json/YkLua/100             2549         2489         0.98   2.37% faster
 Queens/YkLua/1000          504          493          0.98   2.24% faster
 HashIds/YkLua/6000         2873         2829         0.98   1.56% faster
 binarytrees/YkLua/15       3993         3979         1.00   0.36% faster
 Storage/YkLua/1000         9815         9791         1.00   0.24% faster
 CD/YkLua/250               8969         8955         1.00   0.16% faster
 Heightmap/YkLua/2000       749          749          1.00   0.01% faster
 fasta/YkLua/500000         841          841          1.00   0.00% slower
 NBody/YkLua/250000         488          488          1.00   0.04% slower
 spectralnorm/YkLua/1000    902          907          1.01   0.51% slower
 Mandelbrot/YkLua/500       129          130          1.01   0.65% slower
 Sieve/YkLua/3000           468          472          1.01   0.84% slower
 Towers/YkLua/600           908          918          1.01   1.12% slower
 BigLoop/YkLua/1000000000   1617         1641         1.02   1.50% slower
 Havlak/YkLua/1500          18926        19245        1.02   1.69% slower
 Permute/YkLua/1000         790          804          1.02   1.75% slower
 fannkuchredux/YkLua/10     1223         1250         1.02   2.24% slower
 DeltaBlue/YkLua/12000      1949         2000         1.03   2.61% slower
 List/YkLua/1500            832          859          1.03   3.33% slower
 Bounce/YkLua/1500          1142         1186         1.04   3.88% slower
 ```
@vext01 vext01 mentioned this pull request Aug 4, 2025
@ltratt
Copy link
Contributor

ltratt commented Aug 4, 2025

Why are the slower ones slower?

@ltratt ltratt added this pull request to the merge queue Aug 4, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 4, 2025
@vext01
Copy link
Contributor Author

vext01 commented Aug 4, 2025

FWIW, the CI failure is a todo!() in the trace optimiser:

14:21:21 + YKD_SERIALISE_COMPILATION=1 ../src/lua -e_U=true all.lua
14:21:21 
14:21:21 	Starting Tests
14:21:21 random seeds: 1754313681, 574726824
14:21:21 current path:
14:21:21 ****/usr/local/share/lua/5.4/?.lua;/usr/local/share/lua/5.4/?/init.lua;/usr/local/lib/lua/5.4/?.lua;/usr/local/lib/lua/5.4/?/init.lua;./?.lua;./?/init.lua****
14:21:21 
14:21:21     ---- total memory: 50.6K, max memory: 50.6K ----
14:21:21 
14:21:21 time: 5.8e-05 (+5.8e-05)
14:21:21 
14:21:21 ***** FILE 'main.lua'*****
14:21:21 
14:21:21 ***** FILE 'gc.lua'*****
14:21:21 ..testing incremental garbage collection
14:21:21 ..................................................................................................................creating many objects
14:21:21 ....................................................functions with errors
14:21:21 long strings
14:21:21 ..........................steps
14:21:21 steps (2)
14:21:21 ....clearing tables
14:21:21 .................weak tables
14:21:22 ..............................+
14:21:22 .........self-referenced threads
14:21:22 ..........OK
14:21:22     ---- total memory: 1.4M, max memory: 1.4M ----
14:21:22 
14:21:22 time: 0.758009 (+0.757951)
14:21:22 
14:21:22 ***** FILE 'db.lua'*****
14:21:22 testing debug library and debug information
14:21:22 ..............................................................................................................................................
14:21:22 thread '<unnamed>' panicked at ykrt/src/compile/jitc_yk/opt/analyse.rs:73:29:
14:21:22 not yet implemented
14:21:22 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
14:21:22 
14:21:22 thread '<unnamed>' panicked at ykrt/src/job_queue.rs:122:44:
14:21:22 called `Result::unwrap()` on an `Err` value: Any { .. }
14:21:22 fatal runtime error: failed to initiate panic, error 5, aborting
14:21:22 Aborted (core dumped)

I will look into the slowdowns.

@ltratt
Copy link
Contributor

ltratt commented Aug 4, 2025

You can comment that todo out and turn it into a FIXME. A "proper" fix would improve optimisations, but I don't want to go down that route right now.

@vext01
Copy link
Contributor Author

vext01 commented Aug 4, 2025

Why are the slower ones slower?

Looking to bounce which is the one showing the largest slowdown -- when using multitime instead of rebench, it's difficult to claim that there's a significant performance difference.

One thing that is evident is that there is more variation at the process execution level after my change:

 1: /home/vext01/research/yklua/src/lua harness.lua bounce 10 1500
             Mean        Std.Dev.    Min         Median      Max
-real        11.460      0.152       11.269      11.409      11.692
-user        11.787      0.157       11.578      11.753      12.023
-sys         0.202       0.021       0.160       0.208       0.220
+real        11.471      0.516       10.978      11.151      12.330
+user        11.787      0.512       11.286      11.475      12.628
+sys         0.197       0.012       0.180       0.192       0.216

I'll now fix the trace optimiser todo!

@ltratt ltratt added this pull request to the merge queue Aug 4, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 4, 2025
@ltratt ltratt added this pull request to the merge queue Aug 4, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 4, 2025
@vext01
Copy link
Contributor Author

vext01 commented Aug 4, 2025

16:57:15 ***** FILE 'math.lua'*****
16:57:15 .testing numbers and math lib
16:57:15 64-bit integers, 53-bit (mantissa) floats
16:57:16 
16:57:16 thread '<unnamed>' panicked at ykrt/src/compile/jitc_yk/opt/mod.rs:758:17:
16:57:16 assertion `left == right` failed
16:57:16   left: 1
16:57:16  right: 64
16:57:16 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Will investigate tomorrow.

@ltratt ltratt added this pull request to the merge queue Aug 5, 2025
Merged via the queue into ykjit:main with commit d50c434 Aug 5, 2025
2 checks passed
@vext01 vext01 deleted the tforloop branch August 5, 2025 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants