Skip to content

Check walkdir performance under windows #9847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
matklad opened this issue Aug 10, 2021 · 10 comments
Open

Check walkdir performance under windows #9847

matklad opened this issue Aug 10, 2021 · 10 comments
Labels
A-perf performance issues E-has-instructions Issue has some instructions and pointers to code to get started S-actionable Someone could pick this issue up and work on it right now

Comments

@matklad
Copy link
Member

matklad commented Aug 10, 2021

Someone on reddit reports slow startup time on winrows with HDD:

https://old.reddit.com/r/rust/comments/p0z0fa/rustanalyzer_changelog_89/h8dt110/

I don't think we should optimize for HDD, but it makes sense to make sure that we don't do something stupid on windows, as its file system is known to be exciting.

The main disk IO should happen in this function:

https://github.com/matklad/rust-analyzer/blob/5d5d5182c137df293e983d3c3d48869d1c21092c/crates/vfs-notify/src/lib.rs#L175-L213

curiously, we don't do any parallelism whatsoever o_O ?

@matklad
Copy link
Member Author

matklad commented Aug 10, 2021

cc @Veykril , you use windows IIRC, you might want to take a look

@Veykril
Copy link
Member

Veykril commented Aug 10, 2021

How would one even go about checking the i/o performance here? 🤔

@matklad
Copy link
Member Author

matklad commented Aug 10, 2021

analysis-stats report time to load a project, and then sticking profile call there might help. We might also just to optimize first, and profile later -- the optimization should be simple, just pull the actual reading into the outer loop, and .par_iter() it.

@eminence
Copy link
Contributor

I've duplicated below a comment I made on reddit. Unfortunately I can't give access to this particular machine, but I'm happy to run any tests or development code if that's at all helpful.


In my experience, rust performance (and rust-analyzer performance to some extent) is predominantly impacted by disk IO speed, more than CPU speed.

For example, the sandbox repo linked above:

On a Windows machine with a fast local SSD and 8 cores, it takes 46 seconds to do a clean cargo check and about 16 seconds until RA finishes the "indexing" phase.

On a Linux machine with a very slow network file system and 12 cores, it takes 229 seconds to do a clean cargo check, and about 202 seconds until RA finishes indexing.

@matklad
Copy link
Member Author

matklad commented Aug 16, 2021

I've now realized that reading files sequentially is pretty bad architecturally, and that we might want to fix this regardless of win performance. There are cases where readling N files sequentially is very slow, while reading them in parallel is very fast.

Consider a distributed file system, where the set of files is spread across a fleet of the machines. Sequential read means N round-trip, while a fully parallel read would be just one roundtrip.

@lnicola
Copy link
Member

lnicola commented Aug 16, 2021

There's a jwalk crate for parallel directory iteration, but I'm not sure we can use it (because we need to support exclusions and we do our own recursion anyway).

@lnicola lnicola added A-perf performance issues Broken Window Bugs / technical debt to be addressed immediately S-actionable Someone could pick this issue up and work on it right now labels Aug 16, 2021
@matklad
Copy link
Member Author

matklad commented Aug 30, 2021

Tried to access severity here. I've run cargo run --release -p rust-analyzer -- -q analysis-stats . on my laptop on both windows and linux:

Linux
Database loaded:     849.92ms, 291minstr
  crates: 38, mods: 770, decls: 17143, fns: 12804
Item Collection:     14.43s, 82ginstr
  exprs: 351982, ??ty: 353 (0%), ?ty: 223 (0%), !ty: 143
Inference:           56.38s, 280ginstr
Total:               70.81s, 363ginstr

Windows
Database loaded:     2.61s
  crates: 38, mods: 770, decls: 17149, fns: 12803
Item Collection:     18.88s
  exprs: 352110, ??ty: 359 (0%), ?ty: 223 (0%), !ty: 143
Inference:           64.14s
Total:               83.03s

Note that exprs differ because the target is different. I think winapi might actually significantly affect the overall runtime.

So, it is true that database loading phase is 3x-ish slower on Windows. It's unclear which part of that is "cargo on windows is slower" and which is our reading of all source files from disk.

Next steps here:

  • change our analysis stats to split load time into (cargo metadata, cargo check, vfs)
  • profile and optimize! I won't be surprised if we can shave off a second of loading time, which is pretty big

This would be low priority for me personally, as I don't have a convenient windows dev setup. But folks running windows might want to look into this (link to the code in the top comment). cc @rylev 😉

Removing broken windows label, as it is at worst a moderate perf regression.

@matklad matklad added E-has-instructions Issue has some instructions and pointers to code to get started and removed Broken Window Bugs / technical debt to be addressed immediately labels Aug 30, 2021
@lnicola
Copy link
Member

lnicola commented Aug 30, 2021

I wouldn't read too much into the "Database loaded" time. I often see 2-3 seconds on Linux.

@matklad
Copy link
Member Author

matklad commented Aug 30, 2021

Yeah, that's the first part: today it often includes "run cargo check", which shouldn't be fast. The results above are after warm-up runs and perty stable though, so I am fairly certain that there's a measurable perf gap for rust-analyzer between windows and linux. What is not clear to me is:

  • is the pert due to our io, or cargo io (I think database load time includes cargo stating all files in the workspace, for example)
  • whether we can close the gap by using windows APIs more efficiently, or whether this is an inherent "IO is slower on windows".

@lnicola
Copy link
Member

lnicola commented Aug 30, 2021

On a warm cache, I'd expect Windows to be closer to Linux. The (worse) problems are on a cold cache, network drive etc. I still suspect parallel file reading and directory iteration are worth it, but it's not a big problem in common cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-perf performance issues E-has-instructions Issue has some instructions and pointers to code to get started S-actionable Someone could pick this issue up and work on it right now
Projects
None yet
Development

No branches or pull requests

4 participants