-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Avoid computing flags even lazy #466
base: master
Are you sure you want to change the base?
[WIP] Avoid computing flags even lazy #466
Conversation
I added an new optimization unrelated to the old one to be able to see the cummultive effect. Please let me know if I should split the PR. The new benchmarks are:
|
Very nice contribution, thanks! I'll need some time to review this in detail, but here are some initial thoughts:
Unfortunately, any instruction that may trigger a fault, including all memory accesses may read flags, so in the following sequence flags for
We may later have another optimisation that moves the flags computation into the slow path of the memory access.
Yes, please. You second optimisation seems useful on its own and will make this PR easier to review. |
I reverted the commit with the mem fast path optimization and created a new PR.
Re: exceptions, agreed, Although for "well behaved code" it should not be an issue, in general, it is. You can get deterministic behavior by generating exceptions on purpose and reading the flags so they should be computed properly. For the general case we cannot optimize it back in the mem access slow path: 1 add eax,ebx With the current optimization, information about how to compute flags is not saved for instr 1 since 5 completely rewrites the flags. If 4 triggers an exception we need the flags and without saved stated (which now is optimized) we cannot do it.We could partially get back some perf by looking at instructions at analysis time and still optimize the saving of state if operations between 1 and 4 allow reconstruction of necessary state. For instance, this could still be optimized since subsequent operations do not destroy the required state and information could be recovered in the slow path:
Still, it gets complicated and I am not sure it's worth implementing this. Maybe a flag to ignore corner cases ( like enable_optimistic_optimizations ) might add more value to people looking at running "well behaved code". I'm curious what you think of this. LE: edited the 2nd example to be more relevant. |
Thanks!
Good analysis, I believe you're correct. Tracking which registers contain which values from a previous arithmetic instruction could also help optimising jmpcc, setcc and cmov which currently only handles cases where the two instructions are next to each other (see
Not a fan of this. I don't know if I would enable it for copy.sh/v86, because whenever some OS doesn't work, I would need to turn off the flag to verify. I think the aforementioned profiling to determine how often a memory instructions prevents flags from being optimised could help inform the decision. |
This PR tries to optimize lazy flags computation. In the current implementation, information about the operation and operands is stored and is in some cases overwritten by subsequent operations. In other cases operations force generation of particular flags since they only do partial lazy computation.
The proposed optimization adds metadata to the analyzer to be able to identify which flags are tested and which are modified by each instruction. Using this information each jitted BB is analyzed for potential otpimizations and instructions which generate flags that are completely overwritten by subsequent instructions and thus never used are being replaced with alternate implementations which do not do flags computation (this includes not storing anything for lazy computation).
This improves performance significantly in some benchmars like running a tight loop that does mostly register operations.
Since it was not clear to me what the correct metadata is for some instructions, if a particular instruction does not have the metadata it will disable the optimization from that point on until the end of the BB.
Still to do:
Challenges:
Initial benchmarks: