Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about MMX/SSE (possible issue) #1170

Closed
zb3 opened this issue Oct 4, 2024 · 2 comments
Closed

Question about MMX/SSE (possible issue) #1170

zb3 opened this issue Oct 4, 2024 · 2 comments

Comments

@zb3
Copy link

zb3 commented Oct 4, 2024

I've observed - but probably incorrectly - that code compiled with MMX/SSE instructions runs.. slower than the same code targeted for older cpus.
Looking into the v86 code I could not understand anything only see that the jit code for mmx/sse functions produces wasm opcode that calls functions from the main module (which do utilise wasm simd opcodes) to execute the instrcution.. but I doubt that this could actually cause the slowdown.. yet I don't know much about this stuff, maybe there could be other causes?

So generally, is it possible that the code that uses MMX/SSE instructions could run slower on v86 than its equivalent that doesn't use them? Could this indirection be the cause?

(filling this as an issue because if the slowdown was real then rewriting that could be a "trackable issue")

@copy
Copy link
Owner

copy commented Oct 4, 2024

It depends:

  • Many of the less common instructions (especially mmx) are implemented as calls, which indeed makes them slower than non-simd instructions
  • Register moves and shuffles are slower because v86 stores mmx/sse registers in memory, rather than wasm locals (like general purpose registers)
  • Arithmetic instructions are probably slower for the same reason. v86 doesn't generate wasm simd instructions at the moment
  • 64/128-bit memory instructions are significantly faster, because they require fewer tlb accesses and use 64-bit writes

There are many situations where the difference is not obvious without benchmarking. For example, regular arithmetic instructions need to generate extra code for EFLAGS updates, which is not as optimised as it could be. Meanwhile all mmx/sse instructions are prefixed by some code checking cr0.TS, which could be optimised away in many cases.

Overall, I tend to optimising non-simd code, as it's more than 99% of the code that is executed in v86, realistically. However, if you have a minimal testcase worth optimising, feel free to post it here and I (or someone else) might look into the details.

@zb3
Copy link
Author

zb3 commented Oct 5, 2024

Thanks for the detailed response!

Today I wanted to measure the impact but then I realized that the slowdown that made me report this issue was actually not related to mmx/sse instructions or even v86 at all!
It turns out I compared two different programs (more precisely the jvm client and server versions but that's unrelated here).. and the difference could be seen in QEMU too.. apologies for this!

Regarding the topic, I made several comparisons:

  • I ran windows xp with cpuid patched so as to not report mmx/sse support, and the startup time was few seconds longer on average
  • I wrote a simple add/copy test which showed that mmx/sse instructions made the code run significantly faster!
  • I also checked performance of programs like sha256sum

The results were clear - mmx/sse instructions do not cause any slowdowns, they make the code faster - just as it should be :)

So I guess that the indirection itself isn't really slowing it down much (after all it's wasm code calling wasm, since js is not involved)..

@zb3 zb3 closed this as completed Oct 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants