-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve create_input
and create_output
#115
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #111
This PR overhauls the implementation of
create_input
. Previously, it was implemented with a very straightforward recursive function, which basically walked a directory and createdArtifact
s for each entry. The trouble comes with packed executables, which include extra resources. In a large directory structure, multiple files can reference the same resources multiple times (e.g. the interpreterld-linux.so
, which will be referenced by every dynamic executable). Due to the recursive nature, this meant we ended up callingcreate_input
repeatedly for each resource, even if it was already processed.The solution I ended up with was to first build a graph from the directory structure, then to traverse the graph to build the artifacts. I originally just tried to memoize the recursive function, but found it really tricky to structure in a way where the memoization would actually help, so I found doing it in two passes using a graph to ultimately be easier.
Also as part of this PR, I did some refactoring around tests, added more benchmarks, added a new
profiling
Cargo profile, and also improvedcreate_output
(create_output
should be faster now, but I think it may also need to be revisited).The original cause of this issue was an experimental change I was making on the
std
package, which added lots of resources for lots of scripts. This PR makes the call ofcreate_input
finish in a relatively reasonable amount of time (~90 seconds), butcreate_output
is still too slow (I let it run for 40 minutes, and it still hadn't finished). Even 90 seconds is pretty painful though, so even if thecreate_output
side were also about as fast, I want to revisit the approach to the changes I made to cut back on how many resources get used.One downside with the new implementation is that it's slower in simple cases (i.e. without any resources), but it's still much faster in complex cases. See below for performance numbers and details.
Performance
Here are the results of running
cargo bench "test_input"
on my machine, both with the original implementation and the new implementation.New implementation
Original implementation
As you can see, several of the benchmarks are twice as slow for the new implementation. However, the benchmarks under
bench_input_with_shared_resources
are about twice as fast. It's this latter case that tests the edge case that this PR fixes, wherecreate_input
is called with a directory structure with lots of shared resources. I'm hoping that the slower benchmarks can be made to be faster over time, but I think fixing the extreme slowness caused by the edge case is worth the trade-off for the time being.