Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve create_input and create_output #115

Merged
merged 31 commits into from
Aug 26, 2024

Conversation

kylewlacy
Copy link
Member

Resolves #111

This PR overhauls the implementation of create_input. Previously, it was implemented with a very straightforward recursive function, which basically walked a directory and created Artifacts for each entry. The trouble comes with packed executables, which include extra resources. In a large directory structure, multiple files can reference the same resources multiple times (e.g. the interpreter ld-linux.so, which will be referenced by every dynamic executable). Due to the recursive nature, this meant we ended up calling create_input repeatedly for each resource, even if it was already processed.

The solution I ended up with was to first build a graph from the directory structure, then to traverse the graph to build the artifacts. I originally just tried to memoize the recursive function, but found it really tricky to structure in a way where the memoization would actually help, so I found doing it in two passes using a graph to ultimately be easier.

Also as part of this PR, I did some refactoring around tests, added more benchmarks, added a new profiling Cargo profile, and also improved create_output (create_output should be faster now, but I think it may also need to be revisited).

The original cause of this issue was an experimental change I was making on the std package, which added lots of resources for lots of scripts. This PR makes the call of create_input finish in a relatively reasonable amount of time (~90 seconds), but create_output is still too slow (I let it run for 40 minutes, and it still hadn't finished). Even 90 seconds is pretty painful though, so even if the create_output side were also about as fast, I want to revisit the approach to the changes I made to cut back on how many resources get used.

One downside with the new implementation is that it's slower in simple cases (i.e. without any resources), but it's still much faster in complex cases. See below for performance numbers and details.

Performance

Here are the results of running cargo bench "test_input" on my machine, both with the original implementation and the new implementation.

New implementation

Input benchmarks, with the median values 1.966s, 2.022s, 2.005s, 2.009s, 42.72ms, and 48.39ms

Original implementation

Input benchmark with the median values 1.07s, 1.108s, 1.697s, 1.753s, 109.2ms, and 119.2ms

As you can see, several of the benchmarks are twice as slow for the new implementation. However, the benchmarks under bench_input_with_shared_resources are about twice as fast. It's this latter case that tests the edge case that this PR fixes, where create_input is called with a directory structure with lots of shared resources. I'm hoping that the slower benchmarks can be made to be faster over time, but I think fixing the extreme slowness caused by the edge case is worth the trade-off for the time being.

@kylewlacy kylewlacy merged commit 0b8cbd6 into main Aug 26, 2024
5 checks passed
@kylewlacy kylewlacy deleted the improve-create-input-and-create-output branch August 26, 2024 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix slowness in create_input when processing lots of resources
1 participant