Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform action only on deleted artifacts #444

Open
ianwilliams1 opened this issue Mar 29, 2023 · 1 comment
Open

Perform action only on deleted artifacts #444

ianwilliams1 opened this issue Mar 29, 2023 · 1 comment

Comments

@ianwilliams1
Copy link

Love the tool, so convenient, but asking for a little more convenience...

Feature request: Perform action only on deleted artifacts : --deleted only

I would like to apply a command such that I can purge all unwanted artifacts, but only after they have been deleted first. eg:

git-filter-repo --deleted-only --invert-paths --path-regex '.*\.(class|[ejw]ar|zip|z|gz)'
or:
git-filter-repo --deleted-only --strip-blobs-bigger-than 10M

I'm sure this use case is not unusual. As a Lead / Admin of a large group of mixed experience developers, we find often a mis-constructed ignore file has resulted in unwanted artifacts being committed, resulting in repo bloat.

The documentation reads:

Similarly, you could use --paths-from-file to delete many files. For example, you could run git filter-repo --analyze to get reports, look in one such as .git/filter-repo/analysis/path-deleted-sizes.txt and copy all the filenames into a file such as /tmp/files-i-dont-want-anymore.txt and then run

git filter-repo --invert-paths --paths-from-file /tmp/files-i-dont-want-anymore.txt

to delete them all.

But that means I must process the 'path-deleted-sizes.txt' through a regex, create the /tmp file and process again.
I'd liek the convenience of a one-shot command, but with the safety net of knowing I am applying my criteria (regex, size, etc.) only to files that have already been deleted.

Hopefully the explanation (and contrived examples) is clear.

@newren
Copy link
Owner

newren commented Apr 11, 2023

It's an interesting idea, and might make sense for someone to create a contrib script for.

It would not make sense as part of the main tool because:

  • The output files from --analyze are really only meant as guiding points, not as Truth. In particular, if the repo has some ancient branch still open that just hasn't been updated in years, it may be that some long-deleted file is still present within that branch. And thus, the file will not show as being deleted in the --analyze reports, because it still exists on some branch.
  • The tool uses fast-export and fast-import and is thought of as fast-filter. Any kind of pre-processing that involves walking the entire history of the repository as part of the filtering is going to be horrendously slow on big repositories, at least for getting started. I'd rather anything that took that kind of start-up time go in the contrib scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants