-
-
Notifications
You must be signed in to change notification settings - Fork 353
Add rename tracking to blame #2022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add rename tracking to blame #2022
Conversation
838da39
to
a55f9bd
Compare
@Byron I’ve found and fixed a bug in the initial implementation and marked the commit as ready for review again. I had initially found the bug running |
First of all, apologies for the late response. Maybe there should be a "later" tag to indicate I am delaying the review to when I think I have enough time.
These are very existing indeed! Do you think the code for producing them should live somewhere near
I can only imagine how hard it must be to deduce the source of a bug or of inconsistencies with Git by merely comparing the outcome of Similar extensions I could imagine to add to In any case, I hope to get to give the PR a first look soon. |
No worries, that’s absolutely fine (though a tag would indeed be very helpful for planning)!
Currently, the code can be found in a separate repository: https://github.com/cruessler/gix-scripts (plus a couple of local changes that I haven’t pushed yet). I’m very open to moving it somewhere in
That’s an interesting idea! Do you mean not using
That also seems like a good idea! Is there anything similar in |
I’ve pushed a couple of commits to https://github.com/cruessler/gix-scripts so that it now also counts matching and non-matching lines and the percentage of matching lines. (This is now the version I used to calculate the numbers in the description.) |
Since this is still quite rare, I think I will go with posting a message to indicate such plans. Let's also say that I want to finish the review this week(end) :)!
Indeed, that's what I thought. Now, it's very dangerous to break isolation like that, but I'd think it can also be done well enough to avoid running in an unsuitable parent repository. Alternatively, we could find good ways to clone the whole repository on demand - in theory, a simple read-only sell script would do as part of the existing fixtures. That would be efficient as well except for on CI, and maybe there could even be a special case for that to reuse full clones of the parent repository if available. One can certainly get creative here.
This sounds more like stress-testing or continuous exploration of possible bugs, a bit like fuzzing. The system would have to be able to spend compute and collect data for later analysis. That also does sound like
Yes, exactly that. |
Works for me! :-)
I’m going to try and can come up with something. :-)
Then I’m going to prepare for a chase and start experimenting in |
pretty_assertion::assert_equal
gix-blame
This PR adds rename tracking to
gix-blame
. Since the numbers below seem to be very promising, I’m going to mark this PR as ready!There’s probably ways to make some of the code more efficient, but I think those would be micro-optimizations at this point. I compared this branch and
main
runninggix blame CREDITS
in the Linux kernel and if there is a difference, it is hard to distinguish from noise.2025-05-14 Some numbers
I just ran some benchmarks on how the resulting blames compare to
git
. I ran the comparison on my copy of thegitoxide
repository which, at the time of the benchmark, contained 2040 files in total tracked bygit
.HEAD
was at005e8fec0f1665afc42bda632f140d381eca9fe8
when I first ran the comparison. When I re-ran it on 2025-05-28,HEAD
was at5950c5886941ff3adf326985e2f0e6731e07f2a5
.main
Of 2040 files, 493 had identical blames while 1547 had blames with at least one line differing from
git blame
.In the files that did not fully match
git blame
, 178,068/320,282 lines (55.60 %) matchedgit blame
.In all files, 229,618/371,832 lines (61.75 %) matched
git blame
.Update 2025-05-28: In all files, 229,648/371,861 (61.76 %) matched
git blame
.This branch
Of 2040 files, 1023 had identical blames while 1017 had blames with at least one line differing from
git blame
.In the files that did not fully match
git blame
, 266,382/293,968 lines (90.62 %) matchedgit blame
.In all files, 343,595/371,181 lines (92.57 %) matched
git blame
.Update 2025-05-28: Investigating the difference mentioned below, I found a bug in the initial implementation. After fixing it, about 600 more lines got attributed to the same commit
git
attributed them to. In all files, 344,215/371,861 (92.57 %) matchedgit blame
.This means that this PR attributes about 110,000 lines more to the same commits as
git blame
, compared to currentgitoxide
main
.I think I’m quite pleased with these numbers. 😄
There is a difference in total line numbers between both measurements that I’ll investigate, though I don’t think it changes the general picture. Update 2025-05-28: the difference pointed to a bug that I’ve fixed in a separate commit.