🎉 Introducing gix index entries 🎉
#978
Byron
started this conversation in
Show and tell
Replies: 1 comment 6 replies
-
|
Just tried this on the linux kernel repo and I see no performance gains over time git ls-files | wc
85802 85805 3371727
git ls-files 0.03s user 0.01s system 91% cpu 0.036 total
wc 0.01s user 0.00s system 41% cpu 0.036 total
time gix index entries --no-attributes | wc
85802 85805 3371727
gix index entries --no-attributes 0.03s user 0.01s system 73% cpu 0.053 total
wc 0.01s user 0.00s system 22% cpu 0.053 total |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Important
What follows is not an apples-to-apples comparison as in this version of
gix, it will not validate the index's hash. Doing so costs significant amounts of time to the point wheregixis only 20% faster thangitwhen reading big index files. However, whenindex.skipHashis enabled and both don't validate the hash,gixis still 2.17x faster.gix index entriesisgitoxide's version ofgit ls-files, with some added features that I always wanted to have. As such, it mostly prints files that are in the index, i.e. under version control, but with a twist:By default, it also displays all attributes added via
.gitattributeand.gitignorefiles, soometimes with surprising revelations when one realises that some ignored files have been checked in and are marked with a big red cross.That alone makes it quite useful, but with the right
pathspecit's also possible to drill in.What are all the files that need
git-lfsone might ask with an incantation likegix index entries ':(attr:filter=lfs)':The wonky expression
:(attr:filter=lfs)is actually just one way of writing apathspec.About git-pathspecs
When interacting with
gitcommands that take file path, that innocent looking path is actually a full-blownpathspec. Most of us won't ever realise this as expressions like*are typically expanded by the shell anyway, and whatever we do with paths just works.Or does it?
On a case-insensitive filesystem as they occour naturally on Windows or MacOS, one could try the following:
Indeed, the commit failed because the file
Filewasn't added even thoughgit adddidn't fail. What failed wasgit commitwhich didn't have anything to add to the commit.With a trick, we can lure out the
pathspecish nature of the innocent file-path we mistyped the first letter of.But once the case is corrected, it does work:
Pathspecsare case-sensitive by default even if the underlying filesystem is not.However, they can also do additional tricks, which are described quite concisely in the git-glossary, and I wasn't able to find a dedicated chapter akin to the one for
git-attributesfor example. To wrap up with something useful, here is how you can get everything undertests/while ignoring shell scripts:Performance
Supporting
pathspecsis great, but will it be fast enough? For those who want to reproduce it on their machine, there is the folded "Data" section at the bottom of the document. Note that for each run we made sure that the outputgitandgixmatch perfectly.Before we start, note that
r2kis thegitoxiderepo with a mere 1994 files, whiler370kis theWebKitrepository with ~370,000 of them.baseline - no pathspec
This is to see how fast it can be at best by merely dumping the index paths to standard out without doing any extra processing.
hyperfine --warmup 3 'git ls-files' 'gix index entries --no-attributes' -NIt looks like
gixgets better the more work there is by taking only half the timegittakes to output 370k paths.single attribute lookup
Attribute lookup is expensive as many paths have to be matched against many globs which are on top of that dependent on the input path.
r2k=hyperfine --warmup 1 "git ls-files ':(attr:filter=lfs)'" "gix index entries --no-attributes ':(attr:filter=lfs)'" -Nr370k=hyperfine --warmup 1 "git ls-files ':(attr:export-ignore)'" "gix index entries --no-attributes ':(attr:export-ignore)'" -NIt's strange to see that
gitis that much slower in the r370k case, as it's unlikely thatgixmatching engine is this much faster. Apparently it manages to do way less work for the same result.single attribute lookup - more fair
Since the algorithm used by
gixin the case above, with--no-attributesis a bit different, maybe some special optimization sneaks in. Now we leave--no-attributesout which means thatgixwill use worktree attributes first, forcing it to touch disk each time the directory of a path to match is changing (this time turned out to be worth 70ms). Further, much likegit,gixit will now query all attributes instead of just a single one.hyperfine --warmup 1 "git ls-files ':(attr:export-ignore)'" "gix index entries ':(attr:export-ignore)'" -NAs expected, less optimal attribute matching bears a cost, but
gixis still significantly faster at that. It's worth noting thatgixnow also outputs more information, which represents another cost thatgitdoesn't have, even though it's probably quite minor.single glob
Much more common than attribute
pathspecscertainly are those with shell-globs, which is also the default globbing mode inpathspecs.hyperfine --warmup 1 "git ls-files 'Web*'" "gix index entries --no-attributes 'Web*'" -NIt's interesting that
gixis still is faster - it must be more optimal implementation with less losses, as both most definitely do the same amount of work.trivial prefix
Special optimisations are possible if the pathspecs have a common prefix. With it, one is able to work only on a subset of the entries in the index which can lead to significant speedups.
hyperfine --warmup 1 "git ls-files WebDriverTests" "gix index entries --no-attributes WebDriverTests" -NIn this case, the set of paths to work on is reduced to 828, and to my mind it's surprising is still takes that long. A lot of time is spent, of course, to read the entire 53MB index even though most of it remains unused.
Bonus:
--recurse-submoduleson Rust @ a39bdb1d6b9eaf23f2636baee0949d67890abcd8With
gix v0.28.0-233-gec1e5506eon the Rust repository we see what happens if--recurse-submodulesis used, also in comparison with invocations that don't recurse into submodules.It's interesting that there is overhead in dealing with recursion into submodules, which seems to weigh significantly enough to make
gixloose some of its performance head-room. One may wonder if this trend continues to break-even, or if no-recurse version ofgitjust has a disadvantage related to getting started.Conclusion
pathspecsare a powerful and probably undervalued feature, which now is fully supported bygitoxideto enable a variety of features that build on it.gix index entries [PATHSPEC ...]is the first of many moregixcommands to come, showing how much performance we can expect to gain when using it.Q&A
Q: What's the deal with
--no-attributes?This is necessary to make
gix index entriescomparable togit ls-files. Without it,gixwill do a lot of extra work which will be noticeable in the very large repositories we look at here.Data
Programs
Datasets
r2k)r370k)Runs
Baseline - no pathspec
r2k
r370k
Single attribute lookup
r2k
r370k
Single attribute lookup - more fair
r370k
Different output due to attribute display. This prints more, too.
Removal of
--no-attributesmatches all attributes for each path, and uses worktree attributes by default.Simple single-glob
Trivial Prefix
r370k
Bonus: --recurse submodules on Rust repo
Apples vs Apples
When both aren't computing the index,
gixis still a lot faster.Beta Was this translation helpful? Give feedback.
All reactions