Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when running --blob-callback on blobs larger than ~600,000,000 bytes #616

Open
relgukxilef opened this issue Dec 5, 2024 · 1 comment

Comments

@relgukxilef
Copy link

Hello, I'm trying to convert certain files in my repository from one format to another. I wrote some python code to accomplish this and am passing it to git-filter-repo's --blob-callback argument.

This seems to be working for a few thousand commits, then fast-import crashes with the message fatal: cannot truncate pack to skip duplicate: Invalid argument and writes a file fast_import_crash.

I've tried this multiple times, with different callbacks, with the repository filtered to different paths, filtering by path first, then applying my blob callback on a second run of git-filter-repo. The exact blob that it stops at differs between runs, but it always crashes at a blob that is much larger than the other ones. Above 600,000,000 bytes. Perhaps this is a known or intentional limitation of git fast-import or git-filter-repo.

  get-mark :655640
  blob
  mark :655641
  data 1507406
  blob
  mark :655642
  data 1558
  blob
  mark :655643
  data 865875
  blob
  mark :655644
* data 684724504

I have attached one such fast_import_crash file, but I have removed file and branch names, as this is a company repository.
fast_import_crash_30556.zip

@relgukxilef
Copy link
Author

I have tried running git-filter-repo with --blob-callback return and this finishes without issues. I have tried returning conditionally when either the input or the output of the conversion is larger than 1mb (far less than the last blob listed in fast_import_crash), but it still crashes at a 600mb blob, even though it shouldn't do anything with it. Perhaps having already updated some blobs, it fails to handle the large blob even when just passing it along?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant