You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I got good feedback from @apparentlymart regarding an optimization we might be able to make in the CAS system we're introducing in #3929.
To my understanding, it boils down to preserving a single bare git repository as a database, and fetching references from it to minimize the work done over the network when fetching Git content.
I believe this optimization wouldn't remove the need to preserve the existing CAS store so that repositories can be recreated using hard links to a central store, but it would make it so that cache misses don't require a full shallow bare clone of repositories, and will instead fetch the relevant missing objects from the remote.
Adding support for this might require some sort of locking to prevent concurrent updates to the shared git database, but I'm not sure that it does. Thinking about this on a surface level makes me think that the content in the DB should be immutable, and safe to access concurrently, but testing would need to be done to validate that.
The text was updated successfully, but these errors were encountered:
In the little rough sketch I shared with you I think the main point of concurrency contention was the use of the FETCH_HEAD symbolic ref as a representation of "the commit we most recently fetched", since of course concurrent processes doing that fetch-and-update-ref step would clobber each other's FETCH_HEAD.
You might be able to mitigate even that by peeling off one abstraction layer and using git fetch-pack instead, but I've not experimented with that
I got good feedback from @apparentlymart regarding an optimization we might be able to make in the CAS system we're introducing in #3929.
To my understanding, it boils down to preserving a single bare git repository as a database, and fetching references from it to minimize the work done over the network when fetching Git content.
I believe this optimization wouldn't remove the need to preserve the existing CAS store so that repositories can be recreated using hard links to a central store, but it would make it so that cache misses don't require a full shallow bare clone of repositories, and will instead fetch the relevant missing objects from the remote.
Adding support for this might require some sort of locking to prevent concurrent updates to the shared git database, but I'm not sure that it does. Thinking about this on a surface level makes me think that the content in the DB should be immutable, and safe to access concurrently, but testing would need to be done to validate that.
The text was updated successfully, but these errors were encountered: