Proof-of-Concept: Fast path for search#9
Draft
larry-the-table-guy wants to merge 4 commits intoAnnikaCodes:mainfrom
Draft
Proof-of-Concept: Fast path for search#9larry-the-table-guy wants to merge 4 commits intoAnnikaCodes:mainfrom
larry-the-table-guy wants to merge 4 commits intoAnnikaCodes:mainfrom
Conversation
Load small prefix of file and attempt to use that to reject files
Would behave incorrectly if a string ended with a comma
Also fix 2 clippy warnings for redundant refs
In its infinite wisdom, clippy won't let me push my changes until this unrelated lint is fixed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR: In practice, this just reduces CPU usage slightly for typical datasets.
Main idea in this PR is that most files can be rejected by just looking at the first few bytes (it's in the server's own interest to not break this pattern).
For the sake of keeping this PR small, I assumed
p1teamappears afterp1,p2. This could be done in a less hacky manner by searching for"p1":and"p2":, then parsing out the string values.The overwhelming bottleneck is how the battle logs are stored - many small JSON files. Storage devices are just very bad at small random reads; even SSDs much prefer long sequential reads.
Anyway, this was a quick thing I thought I'd share, maybe it gives some ideas for future problems.
cargo benchstatsThis is slower on files with matching player IDs, and faster on files that don't match (the common case).
I added a bench case to help demonstrate that.
Most of the overhead added to the former case could be avoided by reorganizing the code so that the file is only opened once, but that's an invasive change and I just wanted to demonstrate the core idea here.
Before
After
perfstatsConstructed a directory with 500K files, 8GB total.
Uncached
The target usecase.
Dominated by FS and disk seeks.
echo 3 | sudo tee /proc/sys/vm/drop_cachesto clear the OS's file cache between each run. My drive doesn't have its own cache.Before
After
Cached
Not a realistic scenario, but better isolates the CPU-intensive portion.
Before
After