-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Awesome work! 😍 #1
Comments
Thanks your comment! Interesting information, I wrote GitReader to remove libgit2sharp from RelaxVersioner, but of course for general purpose. From your code, I've tried to map the functionality you want GitReader to have:
|
Happy to know you have great plans for your lib. We use libGit2Sharp for other things and totally replacing it with GitReader is probably a long shot (our application is basically a full featured git GUI). We use git.exe for commands (including fetch and status) and libGit2Sharp to introspect the repository, build the graph, compute diff between commits, edit remotes... |
Well, I have often wanted to analyze Git commit graphs (I don't do it full time now, but in the past I have held both CI and progress analysis maintainer roles). One of my motivations is that it would be useful to have such a library as an infrastructure that can be easily handled for such purposes ;) |
@jairbubbles 0.10.0 released. After the merged, I did some tweaking for consistency. If you have any problems, please throw them here or create a separate issue if appropriate. |
@kekyo Cool! I have started some work to benchmark GitReader vs LibGit2Sharp. In a nusthell, what I see is that it's faster when using the "primitive" open (but we're not getting all info that LibGit2Sharp is providing) but it's a lot slower when using the "structure" open which is too bad as the data structures are a lot user friendly. |
|
Primitive access has been able to reduce latency more than expected 😄 The difficulty is that when we open the repository in Structures interface, it is loading packed indexes, branches, tags, stashes, and many other things... At first, I thought about designing a method like |
Yest but in my benchmark I'm also getting the references in the "primitive" mode. My guess is that it's the commit resolving which is taking a lot of time. I'm wondering if a lazy evaluation approach like in LibGit2sharp wouldn't be better. |
I quickly test to remove Commit / Tag resolving for each reference and as expected the Primitive / Structured have similar performance:
|
For example, even in the Structures interface, we may be able to use the idea to stop reading Branches, Tags, etc. in bulk when they are opened, and instead have them call an asynchronous method that explicitly reads them. Suppose we could control what information to read with [Flags]
enum FillFlags
{
None = 0x00,
Branches = 0x01,
RemoteBranches = 0x02,
Tags = 0x04,
Stashes = 0x08,
All = 0x0f,
}
// (Defaulted: FillFlags.All)
using var repository = await Repository.Factory.OpenStructureAsync(FillFlags.None);
// All refernces are NOT loaded.
Trace.Assert(repository.Branches.Count == 0);
Trace.Assert(repository.RemoteBranches.Count == 0);
Trace.Assert(repository.Tags.Count == 0);
Trace.Assert(repository.Stashes.Count == 0);
// The commit doesn't fixup any additional informations.
var commit = await repository.GetCommitAsync("....");
Trace.Assert(commit.Branches.Count == 0);
Trace.Assert(commit.RemoteBranches.Count == 0);
Trace.Assert(commit.Tags.Count == 0);
// After delayed but explicitly reading:
await repository.FillImmediateAsync(FillFlags.Branches | FillFlags.Tags);
Trace.Assert(repository.Branches.Count >= 1);
Trace.Assert(repository.Tags.Count >= 1);
// (this may require careful implementation of the process in Commit to make this possible)
Trace.Assert(commit.Branches.Count >= 1);
Trace.Assert(commit.Tags.Count >= 1); By explicitly calling |
@kekyo I agree that we need control but reading references is not really slow when they are packed. We also need to control commits / tag resolving, it would be some kind of prefetch option. Do you want to pay the price of resolving right away when you open the repository or when you access objects later on? For instance if you have 490 packed branches, 10 branches in refs/heads/. The cost would be in the commits resolving as you'll have to resolve 500 commits. Moreover, I feel like it's mostly useless to resolve all branches or tags, it's unlikely that you need that info for all them, at least for most common scenarios. As for controlling references retrieval why not exposing directly the methods on the repository? // Method for each types?
public class Respository
{
IReadonLyDictionary<string, Branch> GetBranchesAsync(ResolvingFlags ...)
IReadonLyDictionary<string, Branch> GetRemoteBranchesAsync(ResolvingFlags ...)
IReadonLyCollection<Stash> GetStashesAsync(ResolvingFlags ...)
...
}
// Or more generic?
public class Respository
{
IReadonLyDictionary<string, Branch> GetReferencesAsync(ReferenceTypes...)
...
}
public enum ReferenceTypes
{
Branch,
RemoteBranch,
Stash,
Tag
} If we provide enough control through these methods we wouldn't need structures vs primitives anymore which would make the code a lot simpler / easier to consume and it would cover a lot a different use cases. |
In my use case, we open the repository to get its info as soon as the file watcher detects a change so we want this to be as fast as possible. I was thinking that it would be interesting to be able to keep the object cache between several repository opening. // Persistent cache that we would be kept in memory
static ObjectsCache cache = new ObjectsCache();
// When we refresh we would pass the cache
var repository = Factory.OpenRepository(cache);
var branch = await repository.GetHeadBranchAsync();
var commit = await branch.GetCommitAsync(); // If the commit didn't change we didn't price to look for commits in the disk, it's already in the cache |
I see, so you are saying that you would eliminate property accesses such as An immediate example is the test result of Since this example is test code for GitReader, it is fine to write labor-intensive asynchronous method call code, but it is easy to imagine that this kind of labor would be required in general use. Since the Structures interface is a high-level interface, I thought it would be desirable to make it easier to use, even at the compromise of performance loss. (I think it would be better if there was something in between the Structures interface and the Primitive interface, but I also think that having too many options is a problem...) |
How so? I mean you have one method call for one what you need, it's pretty straight forward.
Well for Verify creating a wrapper class is probably the best approach, it gives you control on what you want to test: internal class RepositoryWrapper
{
async Task<RepositoryWrapper> InitAsync(string gitPath)
{
var repository = Factory.OpenRepository(cache);
Branches = await GetBranchesAsync();
RemoteBranches = await GetBranchesAsync();
}
IReadonlyDictionary<string, Branch> Branches { get; }
IReadonlyDictionary<string, Branch> RemoteBranches { get; }
...
} I feel like the high level interface should:
But having a low level is also super interesting for more advanced scenarios but I would expose things like:
It wouldn't expose a class Repository for that API it could be only static methods that takes a .git path. |
In #3 I'm not resolving anymore the commits, it's still slower because of the tags resolving (I have many in the repo I'm benchmarking). I see an optimisation by treating the info about peeled tags in packed-refs, it's currently ignored.
|
Thank you again your suggestions, GitReader reached 1.0.0! This issue is closed, please open new issue when you want to. |
Having a managed library to read .git repository is a plus for the .NET ecosystem.
I quickly looked at how
libGit2Sharp
is used in our application and I'm wondering if you have in mind to support more things in this library or keep it super minimalistic.Here is the main struct that we fill when we inspect a repo:
Your lib seems to support already the main things but is lacking things like remotes or stashes.
Let me know, I'd be much interested to contribute. Cheers!
The text was updated successfully, but these errors were encountered: