-
Notifications
You must be signed in to change notification settings - Fork 717
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
199 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,199 @@ | ||
# Frequently Answered Questions | ||
|
||
## Table of Contents | ||
|
||
* [Why did it rewrite commit hashes?](#why-did-it-rewrite-commit-hashes) | ||
* [Why did it rewrite more commit hashes than I expected?](#why-did-it-rewrite-more-commit-hashes-than-i-expected) | ||
* [Why did it rewrite other branches too?](#why-did-it-rewrite-other-branches-too) | ||
* [Help! Can I recover or undo the filtering?](#help-can-i-recover-or-undo-the-filtering) | ||
* [Can you change git-filter-repo to allow future folks to recover from `--force`'d rewrites?](#can-you-change-git-filter-repo-to-allow-future-folks-to-recover-from-force-rewrites) | ||
* [What kinds of problems does git-filter-repo not try to solve?](#What-kinds-of-problems-does-git-filter-repo-not-try-to-solve) | ||
|
||
|
||
## Why did it rewrite commit hashes? | ||
|
||
This is fundamental to how Git operates. In more detail... | ||
|
||
Each commit in Git is a hash of its contents. Those contents include | ||
the commit message, the author (name, email, and time authored), the | ||
committer (name, email and time committed), the toplevel tree hash, | ||
and the parent(s) of the commit. This means that if any of the commit | ||
fields change, including the tree hash or the hash of the parent(s) of | ||
the commit, then the hash for the commit will change. | ||
|
||
(The same is true for files ("blobs") and trees stored in git as well; | ||
each is a hash of its contents, so literally if anything changes, the | ||
commit hash will change.) | ||
|
||
If you attempt to write commit (or tree or blob) objects with an | ||
incorrect hash, Git will reject it as corrupt. | ||
|
||
## Why did it rewrite more commit hashes than I expected? | ||
|
||
There are two aspects to this: | ||
* Why did commits newer than the ones I expected have their hash change? | ||
* Why did commits older than the ones I expected have their hash change? | ||
|
||
For the first question, see [why filter-repo rewrites commit | ||
hashes](#why-did-it-rewrite-commit-hashes), and note that if you | ||
modify some old commit to, for example, remove a file, then obviously | ||
that commit's hash must change. Further, since that commit will have | ||
a new hash, any other commit with that commit as a parent will need to | ||
have a new hash. That will need to chain all the way to the most | ||
recent commits in history. This is fundamental to Git and there is | ||
nothing you can do to change this. | ||
|
||
For the second question, if you are sure the filter you specified | ||
would not apply to the older commits, then the issue is probably that | ||
git-fast-export and git-fast-import (both of which git-filter-repo | ||
uses) canonicalize history in various ways. This means that even if | ||
you have no filter, these tools sometimes change commit hashes. This | ||
can happen in any of these cases: | ||
|
||
* If you have signed commits, the signatures will be stripped | ||
* If you have commits with extended headers, the extended headers will | ||
be stripped (signed commits are actually a special case of this) | ||
* If you have commits in an encoding other than UTF-8, they will by | ||
default be re-encoded into UTF-8 | ||
* If you have a commit without an author, one will be added that | ||
matches the committer. | ||
* If you have trees that are not canonical (e.g. incorrect sorting | ||
order), they will be canonicalized | ||
|
||
If this affects you and you really only want to rewrite newer commits in | ||
history, you can use the `--refs` argument to git-filter-repo to specify | ||
a range of history that you want rewritten. | ||
|
||
(For those attempting to be clever and use `--refs` for the first | ||
question: Note that if you attempt to only rewrite a few old commits, | ||
then all you'll succeed in is adding new commits that won't be part of | ||
any branch and will be subject to garbage collection. The branches will | ||
still hold on to the unrewritten versions of the commits. Thus, you | ||
have to rewrite all the way to the branch tip for the rewrite to be | ||
meaningful. Said another way, the `--refs` trick is only useful for | ||
restricting the rewrite to newer commits, never for restricting the | ||
rewrite to older commits.) | ||
|
||
## Why did it rewrite other branches too? | ||
|
||
git-filter-repo's name is git-filter-*repo*. | ||
|
||
It can restrict its rewriting to a subset of history, such as a single | ||
branch, using the `--refs` option. However, using that comes with the | ||
risk that one branch now has a different version of some commits than | ||
other branches do; usually, when you rewrite history, you want all | ||
branches that depended on what you are rewriting to be updated. | ||
|
||
## Help! Can I recover or undo the filtering? | ||
|
||
Sure, _if_ you followed the instructions. The instructions told you to | ||
make a fresh clone before running git-filter-repo. If you did that, you | ||
can just throw away your clone with the flubbed rewrite, and make a new | ||
clone. | ||
|
||
If you didn't make a fresh clone, and you didn't run with `--force`, you | ||
would have seen the following warning: | ||
``` | ||
Aborting: Refusing to destructively overwrite repo history since | ||
this does not look like a fresh clone. | ||
[...] | ||
Please operate on a fresh clone instead. If you want to proceed | ||
anyway, use --force. | ||
``` | ||
If you then added `--force`, well, you were warned. | ||
|
||
If you didn't make a fresh clone, and you ran with `--force`, and you | ||
didn't think to read the description of the `--force` option: | ||
``` | ||
Ignore fresh clone checks and rewrite history (an irreversible | ||
operation, especially since it by default ends with an | ||
immediate pruning of reflogs and old objects). | ||
``` | ||
and you didn't read even the beginning of the manual | ||
``` | ||
git-filter-repo destructively rewrites history | ||
``` | ||
and you think it's okay to run a command with `--force` in it on something | ||
you don't have a backup of, then now is the time to reasses your life | ||
choices. `--force` should be a pretty clear warning sign. | ||
|
||
See also the next question. | ||
|
||
## Can you change git-filter-repo to allow future folks to recover from --force'd rewrites? | ||
|
||
This will never be supported. | ||
|
||
* Providing an alternate method to restore would require storing both | ||
the original history and the new history, meaning that those who are | ||
trying to shrink their repository size instead see it grow and have to | ||
figure out extra steps to expunge the old history to see the actual | ||
size savings. Experience showed with other tools that this was | ||
frustrating and difficult to figure out for many users. Providing an | ||
alternate method to restore would mean that users who are trying to | ||
purge sensitive data from their repository still find the sensitive | ||
data after the rewrite because it hasn't actually been purged. In | ||
order to actually purge it, they have to take extra steps, which again | ||
has made things difficult for users in the past with other tools. | ||
|
||
* Providing an alternate method to restore would also mean trying to | ||
figure out what should be backed up and how. The obvious choices used | ||
by previous tools only actually provided partial backups (reflogs | ||
would be ignored for example, as would uncommitted changes whether | ||
staged or not). The only reasonable full backup mechanism is making a | ||
separate clone, which is both expensive and something the user can and | ||
should understand how to do on their own. | ||
|
||
* Providing an alternate method to restore would also mean providing | ||
documentation on how to restore. Past methods by other tools in the | ||
history rewriting space suggested that it was rather difficult for | ||
users to figure out. Difficult enough, in fact, that users simply | ||
didn't ever use them. They instead made a separate clone before | ||
rewriting history and if they didn't like the rewrite, then they just | ||
blew it away and made a new clone to work with. Since that was | ||
observed to be the easy restoration method, I simply enforced it with | ||
this tool, requiring users who look like they might not be operating | ||
on a fresh clone to use the --force flag. | ||
|
||
But more than all that, if there were an alternate method to restore, | ||
why would you have needed to specify the --force flag? Doesn't its | ||
existence (and the wording of its documentation) make it pretty clear on | ||
its own that there isn't going to be a way to restore? | ||
|
||
## What kinds of problems does git-filter-repo not try to solve? | ||
|
||
* Filtering history but magically keeping the same commit IDs | ||
* Bidirectional development between filtered and unfiltered repository (josh) | ||
* Filtering based on the difference (a.k.a. patch or change) between commits (rebase) | ||
* Conversion between different version control systems (reposurgeon) | ||
* Having two people filter their clone of the repository (with the same | ||
filtering command) and getting the same new commit IDs | ||
|
||
## Help! Can I recover or undo the filtering? | ||
|
||
* https://github.com/newren/git-filter-repo/issues/606 | ||
|
||
## Why did it rewrite other branches too? | ||
|
||
* https://github.com/newren/git-filter-repo/issues/527 | ||
|
||
How do I see what was removed? | ||
|
||
* Give answer in terms of `git rev-list --objects --all` in both a | ||
separate fresh clone from before the rewrite and in the repo where | ||
the rewrite was done. Then find the objects that exist in the old | ||
but not the new. | ||
|
||
Why are the commit hashes changing? | ||
|
||
* https://github.com/newren/git-filter-repo/issues/607 | ||
* Multiple reasons: | ||
* Any commit you change must get a new hash | ||
* Any commit who has a changed commit as a parent or earlier ancestor | ||
must change | ||
* commit and tag signatures | ||
* git-fast-export and git-fast-import canonicalize history in various | ||
ways... | ||
|
||
Handling corruption? | ||
|
||
git fsck throws warnings/errors=>git-filter-repo may not parse the objects... |