Check walkdir performance under windows #9847

matklad · 2021-08-10T14:27:13Z

Someone on reddit reports slow startup time on winrows with HDD:

https://old.reddit.com/r/rust/comments/p0z0fa/rustanalyzer_changelog_89/h8dt110/

I don't think we should optimize for HDD, but it makes sense to make sure that we don't do something stupid on windows, as its file system is known to be exciting.

The main disk IO should happen in this function:

https://github.com/matklad/rust-analyzer/blob/5d5d5182c137df293e983d3c3d48869d1c21092c/crates/vfs-notify/src/lib.rs#L175-L213

curiously, we don't do any parallelism whatsoever o_O ?

matklad · 2021-08-10T14:27:31Z

cc @Veykril , you use windows IIRC, you might want to take a look

Veykril · 2021-08-10T15:14:01Z

How would one even go about checking the i/o performance here? 🤔

matklad · 2021-08-10T15:19:13Z

analysis-stats report time to load a project, and then sticking profile call there might help. We might also just to optimize first, and profile later -- the optimization should be simple, just pull the actual reading into the outer loop, and .par_iter() it.

eminence · 2021-08-10T19:04:17Z

I've duplicated below a comment I made on reddit. Unfortunately I can't give access to this particular machine, but I'm happy to run any tests or development code if that's at all helpful.

In my experience, rust performance (and rust-analyzer performance to some extent) is predominantly impacted by disk IO speed, more than CPU speed.

For example, the sandbox repo linked above:

On a Windows machine with a fast local SSD and 8 cores, it takes 46 seconds to do a clean cargo check and about 16 seconds until RA finishes the "indexing" phase.

On a Linux machine with a very slow network file system and 12 cores, it takes 229 seconds to do a clean cargo check, and about 202 seconds until RA finishes indexing.

matklad · 2021-08-16T10:26:56Z

I've now realized that reading files sequentially is pretty bad architecturally, and that we might want to fix this regardless of win performance. There are cases where readling N files sequentially is very slow, while reading them in parallel is very fast.

Consider a distributed file system, where the set of files is spread across a fleet of the machines. Sequential read means N round-trip, while a fully parallel read would be just one roundtrip.

lnicola · 2021-08-16T11:28:00Z

There's a jwalk crate for parallel directory iteration, but I'm not sure we can use it (because we need to support exclusions and we do our own recursion anyway).

matklad · 2021-08-30T14:21:25Z

Tried to access severity here. I've run cargo run --release -p rust-analyzer -- -q analysis-stats . on my laptop on both windows and linux:

Linux
Database loaded:     849.92ms, 291minstr
  crates: 38, mods: 770, decls: 17143, fns: 12804
Item Collection:     14.43s, 82ginstr
  exprs: 351982, ??ty: 353 (0%), ?ty: 223 (0%), !ty: 143
Inference:           56.38s, 280ginstr
Total:               70.81s, 363ginstr

Windows
Database loaded:     2.61s
  crates: 38, mods: 770, decls: 17149, fns: 12803
Item Collection:     18.88s
  exprs: 352110, ??ty: 359 (0%), ?ty: 223 (0%), !ty: 143
Inference:           64.14s
Total:               83.03s

Note that exprs differ because the target is different. I think winapi might actually significantly affect the overall runtime.

So, it is true that database loading phase is 3x-ish slower on Windows. It's unclear which part of that is "cargo on windows is slower" and which is our reading of all source files from disk.

Next steps here:

change our analysis stats to split load time into (cargo metadata, cargo check, vfs)
profile and optimize! I won't be surprised if we can shave off a second of loading time, which is pretty big

This would be low priority for me personally, as I don't have a convenient windows dev setup. But folks running windows might want to look into this (link to the code in the top comment). cc @rylev 😉

Removing broken windows label, as it is at worst a moderate perf regression.

lnicola · 2021-08-30T14:26:43Z

I wouldn't read too much into the "Database loaded" time. I often see 2-3 seconds on Linux.

matklad · 2021-08-30T14:31:25Z

Yeah, that's the first part: today it often includes "run cargo check", which shouldn't be fast. The results above are after warm-up runs and perty stable though, so I am fairly certain that there's a measurable perf gap for rust-analyzer between windows and linux. What is not clear to me is:

is the pert due to our io, or cargo io (I think database load time includes cargo stating all files in the workspace, for example)
whether we can close the gap by using windows APIs more efficiently, or whether this is an inherent "IO is slower on windows".

lnicola · 2021-08-30T14:35:53Z

On a warm cache, I'd expect Windows to be closer to Linux. The (worse) problems are on a cold cache, network drive etc. I still suspect parallel file reading and directory iteration are worth it, but it's not a big problem in common cases.

lnicola added A-perf performance issues Broken Window Bugs / technical debt to be addressed immediately S-actionable Someone could pick this issue up and work on it right now labels Aug 16, 2021

matklad added E-has-instructions Issue has some instructions and pointers to code to get started and removed Broken Window Bugs / technical debt to be addressed immediately labels Aug 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check walkdir performance under windows #9847

Check walkdir performance under windows #9847

matklad commented Aug 10, 2021

matklad commented Aug 10, 2021

Veykril commented Aug 10, 2021

matklad commented Aug 10, 2021

eminence commented Aug 10, 2021

matklad commented Aug 16, 2021

lnicola commented Aug 16, 2021 •

edited

Loading

matklad commented Aug 30, 2021 •

edited

Loading

lnicola commented Aug 30, 2021

matklad commented Aug 30, 2021

lnicola commented Aug 30, 2021

Check walkdir performance under windows #9847

Check walkdir performance under windows #9847

Comments

matklad commented Aug 10, 2021

matklad commented Aug 10, 2021

Veykril commented Aug 10, 2021

matklad commented Aug 10, 2021

eminence commented Aug 10, 2021

matklad commented Aug 16, 2021

lnicola commented Aug 16, 2021 • edited Loading

matklad commented Aug 30, 2021 • edited Loading

lnicola commented Aug 30, 2021

matklad commented Aug 30, 2021

lnicola commented Aug 30, 2021

lnicola commented Aug 16, 2021 •

edited

Loading

matklad commented Aug 30, 2021 •

edited

Loading