Skip to content

Commit

Permalink
wip
Browse files Browse the repository at this point in the history
  • Loading branch information
newren committed Nov 24, 2024
1 parent 9ae4ae6 commit 0d05d7e
Showing 1 changed file with 309 additions and 0 deletions.
309 changes: 309 additions & 0 deletions Documentation/curated-examples-from-issues.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,309 @@
# Curated examples from issues

Lots of people have filed issues against git-filter-repo, and many times it
boils down into questions of "How do I?" or "Why doesn't this work?"

I thought I'd collect a bunch of these as example repository filterings
that others may be interested in.

## Table of Contents

* [Adding files to root commits](#adding-files-to-root-commits)
* [Purge a large list of files](#purge-a-large-list-of-files)

## Adding files to root commits

<!-- https://github.com/newren/git-filter-repo/issues/21 -->

Here's an example that will take `/path/to/existing/README.md` and
store it as `README.md` in the repository, and take
`/home/myusers/mymodule.gitignore` and store it as `src/.gitignore` in
the repository:

```
git filter-repo --commit-callback "if not commit.parents: commit.file_changes += [
FileChange(b'M', b'README.md', b'$(git hash-object -w '/path/to/existing/README.md')', b'100644'),
FileChange(b'M', b'src/.gitignore', b'$(git hash-object -w '/home/myusers/mymodule.gitignore')', b'100644')]"
```

Alternatively, you could also use the [insert-beginning contrib script](../contrib/filter-repo-demos/insert-beginning).

## Purge a large list of files

<!-- https://github.com/newren/git-filter-repo/issues/63 -->

Stick all the files in some file (one per line),
e.g. ../DELETED_FILENAMES.txt, and then run

```
git filter-repo --invert-paths --paths-from-file ../DELETED_FILENAMES.txt
```

## Extracting a libary to a separate repo

<!-- https://github.com/newren/git-filter-repo/issues/80 -->

```
git filter-repo \
--path src/some-folder/some-feature \
--path-rename src/some-folder/some-feature/:src/
```

## Replace words in all commit messages

<!-- https://github.com/newren/git-filter-repo/issues/83 -->

```
git-filter-repo --message-callback 'return message.replace(b"stuff", b"task")'
```

## Only keep files from two branches

<!-- https://github.com/newren/git-filter-repo/issues/91 -->

Let's say you know that the files currently present on two branches
are the only files that matter. Files that used to exist in either of
these branches, or files that only exist on some other branch, should
all be deleted from all versions of history. This can be accomplished
by getting a list of files from each branch, combining them, sorting
the list and picking out just the unique entries, then passing to
`--paths-from-file`:

```
git ls-tree -r ${BRANCH1} >../my-files
git ls-tree -r ${BRANCH2} >>../my-files
sort ../my-files | uniq >../my-relevant-files
git filter-repo --paths-from-file ../my-relevant-files
```

## Renormalize end-of-line characters and add a .gitattributes

<!-- https://github.com/newren/git-filter-repo/issues/122 -->

```
contrib/filter-repo-demos/lint-history dos2unix
[edit .gitattributes]
contrib/filter-repo-demos/insert-beginning .gitattributes
```

## Remove spaces at the end of lines

<!-- https://github.com/newren/git-filter-repo/issues/145 -->

Removing all spaces at the end of lines of non-binary files, including
stripping trailing carriage returns:

```
git filter-repo --replace-text <(echo 'regex:[\r\t ]+(\n|$)==>\n')
```

## Having both exclude and include rules for filenames

<!-- https://github.com/newren/git-filter-repo/issues/230 -->

If you want to have rules to both include and exclude filenames, you
can simply invoke `git filter-repo` multiple times. Alternatively,
you can dispense with `--path` arguments and instead use the more
generic `--filename-callback`. For example to include all files under
`src/` except for `src/README.md`:

```
git filter-repo --filename-callback '
if filename == b"src/README.md":
return None
if filename.startswith(b"src/"):
return filename
return None'
```

## Removing paths with a certain extension

<!-- https://github.com/newren/git-filter-repo/issues/274 -->

```
git filter-repo --invert-paths --path-glob '*.xsa'
```

or

```
git filter-repo --filename-callback '
if filename.endswith(b".xsa"):
return None
return filename'
```

## Removing a directory

<!-- https://github.com/newren/git-filter-repo/issues/278 -->

```
git filter-repo --path node_modules/electron/dist/ --invert-paths
```

## Convert from NFD filenames to NFC

<!-- https://github.com/newren/git-filter-repo/issues/296 -->

Given that Mac does utf-8 normalization of filenames, and has
historically switched which kind of normalization it does, users may
have committed files with alternative normalizations to their
repository. If someone wants to convert filenames in NFD form to NFC,
they could run

```
git filter-repo --filename-callback '
try:
return subprocess.check_output("iconv -f utf-8-mac -t utf-8".split(),
input=filename)
except:
return filename
'
```

or

```
git filter-repo --filename-callback '
import unicodedata
try:
return bytearray(unicodedata.normalize('NFC', filename.decode('utf-8')), 'utf-8')
except:
return filename
'
```

## Set the committer of the last few commits to myself

<!-- https://github.com/newren/git-filter-repo/issues/379 -->

```
git filter-repo --refs main~5..main --commit-callback '
commit.commiter_name = b"My Wonderful Self"
commit.committer_email = b"[email protected]"
'
```

## Handling special characters, e.g. accents in names

<!-- https://github.com/newren/git-filter-repo/issues/383 -->

Since characters like ë and á are multi-byte characters and python
won't allow you to directly place those in a bytestring
(e.g. b"Raphaël González" would result in a `SyntaxError: bytes can
only contain ASCII literal characters` error from Python), you just
need to make a normal string and then convert to a bytestring to
handle these. For example, changing the author name and email where
the author email is currently `[email protected]`:

```
git filter-repo --refs main~5..main --commit-callback '
if commit.author_email = b"[email protected]":
commit.author_name = "Raphaël González".encode()
commit.author_email = b"[email protected]"
'
```

## Handling repository corruption

<!-- https://github.com/newren/git-filter-repo/issues/420 -->

First, run fsck to get a list of the corrupt objects, e.g.:
```
$ git fsck
error in commit 166f57b3fbe31257100361ecaf735f305b533b21: missingSpaceBeforeDate: invalid author/committer line - missing space before date
Checking object directories: 100% (256/256), done.
```

Then print out that object literally to a temporary file:
```
$ git cat-file -p 166f57b3fbe31257100361ecaf735f305b533b21 >tmp
```

Taking a look at the file would show, for example:
```
$ cat tmp
tree e1d871155fce791680ec899fe7869067f2b4ffd2
author My Name <[email protected]>1673287380 -0800
committer My Name <[email protected]> 1673287380 -0800
Initial
```

Edit that file to fix the error (in this case, the missing space
between author email and author date):

```
tree e1d871155fce791680ec899fe7869067f2b4ffd2
author My Name <[email protected]> 1673287380 -0800
committer My Name <[email protected]> 1673287380 -0800
Initial
```

Save the updated file, then use `git-replace` to make a replace reference
for it.
```
$ git replace -f 166f57b3fbe31257100361ecaf735f305b533b21 $(git hash-object -t commit -w tmp)
```

Then remove the temporary file `tmp` and run `filter-repo` to consume
the replace reference and make it permanent:

```
$ rm tmp
$ git filter-repo --proceed
```

Note that if you have multiple corrupt objects, you only need to run
filter-repo once; just wait to do that step until you have all the
replacements in place.

## Removing all files with a backslash in them

<!-- https://github.com/newren/git-filter-repo/issues/427 -->

```
git filter-repo --filename-callback 'return None if b'\\' in filename else filename'
```

## Replace a binary blob in history

Either

```
git filter-repo --blob-callback '
if blob.original_id == b"<hash of the bad object>":
blob.data = open("<path to the replacement file>", "rb").read()
'
```

or

```
```



<!-- https://github.com/newren/git-filter-repo/issues/436 -->
replace a binary blob in history

<!-- https://github.com/newren/git-filter-repo/pull/542 -->
callback for lint-history

<!-- https://github.com/newren/git-filter-repo/issues/300 -->
using replace refs to delete old history

<!-- https://github.com/newren/git-filter-repo/issues/492 -->
replacing pngs with compressed alternative
(#537 also used a change.blob_id thingy)

<!-- https://github.com/newren/git-filter-repo/issues/490 -->
<!-- https://github.com/newren/git-filter-repo/issues/504 -->
need for a multi-step filtering to avoid path collisions or ordering issues

<!-- https://lore.kernel.org/git/CABPp-BFqbiS8xsbLouNB41QTc5p0hEOy-EoV0Sjnp=xJEShkTw@mail.gmail.com/ -->
Two things:
textwrap.dedent
easier example of using git-filter-repo as a library

0 comments on commit 0d05d7e

Please sign in to comment.