Performance analysis

`llvm-disasm` is noticeably slower than its LLVM counterpart. This issue will attempt to keep track of data regarding this performance discrepancy and attempts at improving the situation.

This is the current flame graph of running `llvm-disasm` on a ~30MB PX4 bitcode file generated by Clang 12:

![image](https://user-images.githubusercontent.com/478606/139138484-5afa5208-6735-407a-9af0-88e111495403.png)

There is an almost even divide between the time spent parsing, and the time spent pretty-printing.

Parsing
=====

![image](https://user-images.githubusercontent.com/478606/139139069-a74d0405-95bb-4d1b-958f-e2e7cc0d5c1b.png)

It seems like one third of the time is spent gobbling bytes from the input stream. Perhaps one could check whether there is something inherently inefficient in the way we consume input.

The other two thirds seem dominated by `parseModuleBlock`:

![image](https://user-images.githubusercontent.com/478606/139139598-d15e1841-50df-4930-aa26-15a05e8d9ef6.png)

From left to right, the columns correspond to `Data.LLVM.BitCode.IR.Function.finalizeStmt`, `Data.Generics.Uniplate.Internal.Data.descendBiData`, and `Data.LLVM.BitCode.IR.Function.parseFunctionBlockEntry`.

Pretty-printing
=========

(NOTE: this graph has been obtained with a branch where I replaced the `pretty` package with `prettyprinter`, to see if it made a difference. It does not do much on its own.)

![image](https://user-images.githubusercontent.com/478606/139139982-96d606dc-f87f-495a-9f64-a99733e0dd39.png)

On the pretty-printing side, we only see details of two cost centres: `Prettyprinter.Render.Text.renderIO`, and `Text.LLVM.PP.ppModule`.

The former seems unavoidable, the latter is decomposed as such:

![image](https://user-images.githubusercontent.com/478606/139140395-619a0357-1f6b-4d34-9eb4-95dc8bee368c.png)

I haven't looked into this yet, but those `ppDebugLoc'` and `ppDebugInfo'` sure make things slow, I wonder if there's something to look into there.

But, going back up a step, the pretty-printing flame graph also had this huge, unlabeled portion. My guess is that it likely corresponds to `Prettyprinter.layoutPretty`, since this is definitely called, doesn't appear elsewhere, and is likely heavy in computation. I'm not sure why it is not reported though.

As mentioned, I tried to replace `pretty` with `prettyprinter`, and it's not much faster, even with an unbounded-width layout. It definitely goes faster when using `renderCompact`, but that one has very ugly output. I wonder if there's something we can do in between, given there's barely any interesting pretty-printing going on in our output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance analysis #176

Parsing

Pretty-printing

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance analysis #176

Description

Parsing

Pretty-printing

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions