Question about MMX/SSE (possible issue) #1170

zb3 · 2024-10-04T00:21:36Z

I've observed - but probably incorrectly - that code compiled with MMX/SSE instructions runs.. slower than the same code targeted for older cpus.
Looking into the v86 code I could ~~not understand anything~~ only see that the jit code for mmx/sse functions produces wasm opcode that calls functions from the main module (which do utilise wasm simd opcodes) to execute the instrcution.. but I doubt that this could actually cause the slowdown.. yet I don't know much about this stuff, maybe there could be other causes?

So generally, is it possible that the code that uses MMX/SSE instructions could run slower on v86 than its equivalent that doesn't use them? Could this indirection be the cause?

(filling this as an issue because if the slowdown was real then rewriting that could be a "trackable issue")

copy · 2024-10-04T05:39:41Z

It depends:

Many of the less common instructions (especially mmx) are implemented as calls, which indeed makes them slower than non-simd instructions
Register moves and shuffles are slower because v86 stores mmx/sse registers in memory, rather than wasm locals (like general purpose registers)
Arithmetic instructions are probably slower for the same reason. v86 doesn't generate wasm simd instructions at the moment
64/128-bit memory instructions are significantly faster, because they require fewer tlb accesses and use 64-bit writes

There are many situations where the difference is not obvious without benchmarking. For example, regular arithmetic instructions need to generate extra code for EFLAGS updates, which is not as optimised as it could be. Meanwhile all mmx/sse instructions are prefixed by some code checking cr0.TS, which could be optimised away in many cases.

Overall, I tend to optimising non-simd code, as it's more than 99% of the code that is executed in v86, realistically. However, if you have a minimal testcase worth optimising, feel free to post it here and I (or someone else) might look into the details.

zb3 · 2024-10-05T01:14:19Z

Thanks for the detailed response!

Today I wanted to measure the impact but then I realized that the slowdown that made me report this issue was actually not related to mmx/sse instructions or even v86 at all!
It turns out I compared two different programs (more precisely the jvm client and server versions but that's unrelated here).. and the difference could be seen in QEMU too.. apologies for this!

Regarding the topic, I made several comparisons:

I ran windows xp with cpuid patched so as to not report mmx/sse support, and the startup time was few seconds longer on average
I wrote a simple add/copy test which showed that mmx/sse instructions made the code run significantly faster!
I also checked performance of programs like sha256sum

The results were clear - mmx/sse instructions do not cause any slowdowns, they make the code faster - just as it should be :)

So I guess that the indirection itself isn't really slowing it down much (after all it's wasm code calling wasm, since js is not involved)..

zb3 closed this as completed Oct 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about MMX/SSE (possible issue) #1170

Question about MMX/SSE (possible issue) #1170

zb3 commented Oct 4, 2024

copy commented Oct 4, 2024

zb3 commented Oct 5, 2024 •

edited

Loading

Question about MMX/SSE (possible issue) #1170

Question about MMX/SSE (possible issue) #1170

Comments

zb3 commented Oct 4, 2024

copy commented Oct 4, 2024

zb3 commented Oct 5, 2024 • edited Loading

zb3 commented Oct 5, 2024 •

edited

Loading