-
Notifications
You must be signed in to change notification settings - Fork 7
Description
llvm-disasm
is noticeably slower than its LLVM counterpart. This issue will attempt to keep track of data regarding this performance discrepancy and attempts at improving the situation.
This is the current flame graph of running llvm-disasm
on a ~30MB PX4 bitcode file generated by Clang 12:
There is an almost even divide between the time spent parsing, and the time spent pretty-printing.
Parsing
It seems like one third of the time is spent gobbling bytes from the input stream. Perhaps one could check whether there is something inherently inefficient in the way we consume input.
The other two thirds seem dominated by parseModuleBlock
:
From left to right, the columns correspond to Data.LLVM.BitCode.IR.Function.finalizeStmt
, Data.Generics.Uniplate.Internal.Data.descendBiData
, and Data.LLVM.BitCode.IR.Function.parseFunctionBlockEntry
.
Pretty-printing
(NOTE: this graph has been obtained with a branch where I replaced the pretty
package with prettyprinter
, to see if it made a difference. It does not do much on its own.)
On the pretty-printing side, we only see details of two cost centres: Prettyprinter.Render.Text.renderIO
, and Text.LLVM.PP.ppModule
.
The former seems unavoidable, the latter is decomposed as such:
I haven't looked into this yet, but those ppDebugLoc'
and ppDebugInfo'
sure make things slow, I wonder if there's something to look into there.
But, going back up a step, the pretty-printing flame graph also had this huge, unlabeled portion. My guess is that it likely corresponds to Prettyprinter.layoutPretty
, since this is definitely called, doesn't appear elsewhere, and is likely heavy in computation. I'm not sure why it is not reported though.
As mentioned, I tried to replace pretty
with prettyprinter
, and it's not much faster, even with an unbounded-width layout. It definitely goes faster when using renderCompact
, but that one has very ugly output. I wonder if there's something we can do in between, given there's barely any interesting pretty-printing going on in our output.