Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Git usage in CAS #3976

Open
yhakbar opened this issue Mar 6, 2025 · 1 comment
Open

Optimize Git usage in CAS #3976

yhakbar opened this issue Mar 6, 2025 · 1 comment
Labels
enhancement New feature or request preserved Preserved issues never go stale

Comments

@yhakbar
Copy link
Collaborator

yhakbar commented Mar 6, 2025

I got good feedback from @apparentlymart regarding an optimization we might be able to make in the CAS system we're introducing in #3929.

To my understanding, it boils down to preserving a single bare git repository as a database, and fetching references from it to minimize the work done over the network when fetching Git content.

I believe this optimization wouldn't remove the need to preserve the existing CAS store so that repositories can be recreated using hard links to a central store, but it would make it so that cache misses don't require a full shallow bare clone of repositories, and will instead fetch the relevant missing objects from the remote.

Adding support for this might require some sort of locking to prevent concurrent updates to the shared git database, but I'm not sure that it does. Thinking about this on a surface level makes me think that the content in the DB should be immutable, and safe to access concurrently, but testing would need to be done to validate that.

@yhakbar yhakbar added enhancement New feature or request preserved Preserved issues never go stale labels Mar 6, 2025
@apparentlymart
Copy link

In the little rough sketch I shared with you I think the main point of concurrency contention was the use of the FETCH_HEAD symbolic ref as a representation of "the commit we most recently fetched", since of course concurrent processes doing that fetch-and-update-ref step would clobber each other's FETCH_HEAD.

You might be able to mitigate even that by peeling off one abstraction layer and using git fetch-pack instead, but I've not experimented with that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request preserved Preserved issues never go stale
Projects
None yet
Development

No branches or pull requests

2 participants