Filter branch (git filter-branch)
git filter-branch
is a powerful history rewriting tool in Git that allows deep modifications to a repository's commit history. It can batch-modify file contents, author information, commit messages, and more in commits, but it must be used with caution as it changes commit hashes, which may impact collaboration.
Basic Concepts and Uses
The core functionality of git filter-branch
is to traverse all commits and rewrite history according to specified rules. Common scenarios include:
- Permanently deleting sensitive files (e.g., passwords, keys) from history
- Batch-modifying author/committer information
- Extracting a subdirectory as the root of a new repository
- Resolving historical conflicts when merging multiple repositories
# Basic syntax structure
git filter-branch [options] <subcommand> [arguments]
Typical Use Cases
Deleting Files from History
To delete a specific file (e.g., credentials.txt
) from all commits:
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch credentials.txt' \
--prune-empty --tag-name-filter cat -- --all
--index-filter
is faster than--tree-filter
(does not check out files)--prune-empty
automatically removes resulting empty commits--tag-name-filter cat
preserves tag names
Modifying Commit Information
Batch-modifying author email addresses:
git filter-branch --env-filter '
if [ "$GIT_AUTHOR_EMAIL" = "old@example.com" ]; then
GIT_AUTHOR_EMAIL="new@example.com";
fi
' --tag-name-filter cat -- --all
Extracting a Subdirectory
Promoting subdir/
to the repository root:
git filter-branch --subdirectory-filter subdir -- --all
Advanced Usage Examples
Conditionally Modifying File Content
Using --tree-filter
to modify text in all .js
files:
git filter-branch --tree-filter "
find . -name '*.js' -exec sed -i 's/var/const/g' {} +
" HEAD~10..HEAD
Combining Multiple Conditions
Modifying author information and deleting files simultaneously:
git filter-branch --env-filter '
# Modify author
export GIT_AUTHOR_NAME="New Name"
export GIT_AUTHOR_EMAIL="new@email.com"
' --index-filter '
# Delete file
git rm --cached --ignore-unmatch secret.txt
' --prune-empty -- --all
Performance Optimization Tips
- Limit scope: Add a commit range (e.g.,
HEAD~20..HEAD
) - Use index filtering:
--index-filter
is 10-100x faster than--tree-filter
- Disable garbage collection: Temporarily disable GC
git -c gc.auto=0 filter-branch ...
Risks and Considerations
- Hash changes: All rewritten commits generate new hashes, breaking collaboration
- Backup required: Always create a repository backup before proceeding
git clone --mirror original.git backup.git
- Clean up cache: Clear reference cache after operation
git for-each-ref --format="delete %(refname)" refs/original | git update-ref --stdin git reflog expire --expire=now --all git gc --prune=now
Alternative Tool Comparison
For large repositories, consider more efficient tools:
git-filter-repo
(Python implementation, faster)BFG Repo-Cleaner
(Java implementation, optimized for deleting large files)
# Example of using git-filter-repo to delete files
git filter-repo --path credentials.txt --invert-paths
Practical Example: Project Migration
Assume you need to extract the project-v1/
directory as a new repository and modify all committer information:
# Step 1: Clone the original repository
git clone https://example.com/original.git
cd original
# Step 2: Extract the subdirectory
git filter-branch --subdirectory-filter project-v1 -- --all
# Step 3: Modify commit information
git filter-branch --env-filter '
export GIT_COMMITTER_NAME="Team"
export GIT_COMMITTER_EMAIL="team@company.com"
' -- --all
# Step 4: Push to the new repository
git remote set-url origin https://example.com/new-repo.git
git push --force --all
本站部分内容来自互联网,一切版权均归源网站或源作者所有。
如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn
上一篇:签署提交与标签