In this post I describe how I used git-filter-repo
to rewrite the history of a git repository to move files into a subfolder.
Background: rewriting git history
As a git user, I like to Rebase. I like to make lots of small commits and tidy them up later using interactive rebase, and to rewrite my PRs to make them easier to understand (and review). I use git push origin --force-with-lease
so much, that I have it aliased as git pof
.
What I don't do is rewrite the history of my main
/master
branch. There's a whole world of pain there, as other people will likely have started branches from the branch, and they can easily end up in a complete mess.
However, sometimes it makes sense.
I was working on a small side project the other day, when I realised it would really make sense for it to effectively be a "monorepo". So rather than having all the existing code in the root directory, I wanted to move it to a child directory.
So I started with a directory that looked like this:
And I wanted a directory that looked like this:
The notable points here are:
- Everything has been moved to an engine subfolder
- Except the .gitattributes and .gitignore files, which are still at the top level.
The simplest way to do this is to just move all the files, and create a new commit with the changes, job done. The downside to that is that while git itself is ok at tracking file moves (it sometimes gets things wrong), it can cause some other issues.
For example, if you're looking at a file on GitHub, and you want to see what it looks like at a particular commit, then you can use the branch selector to change it. However, if the file has moved, you'll get a 404. Not a great experience.
If the odd file has moved, that's not a big deal, but if literally every file has moved, that's not a great experience.
So what's the alternative? Rewriting history!
Rewriting history: the options
With rewriting history, we update the git branches to make it look like all the files were originally committed to the engine subfolder. There's no "sudden move". The history shows them as always having been in the engine folder.
This sort of wholesale rewriting of your
main
/master
branch is definitely not advisable if you are sharing the repo publicly. You will likely break all sorts of people's work!
Normally when I'm rewriting history I use git rebase -i
in combination with git reset HEAD~
. This lets me squash commits together, pause to split them apart, reorder them, or remove them entirely. That's great for when you're massaging a PR, but it's really not designed for wholesale rewriting of an entire repository.
For those scenarios, git filter-branch
is a better option. This is a complex git command, that frankly, scares me. I have used it, on occasion, but the syntax is janky, you typically have to incorporate a lot of bash, it's often slow, and you could mess up your whole repository. Yay!
Just take a look at this Stack Overflow question which is about a similar requirement but in reverse—moving from the engine folder to the root. One of the suggested answers suggests running the following command:
git filter-branch -f --index-filter 'PATHS=`git ls-files -s | sed "s/^engine//"`; \
GIT_INDEX_FILE=$GIT_INDEX_FILE.new; \
echo -n "$PATHS" | \
git update-index --index-info \
&& if [ -e "$GIT_INDEX_FILE.new" ]; \
then mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"; \
fi' -- --all
That's definitely something. Does it work? Probably. Would you want to write your own? Almost certainly not.
So instead of trying to figure out how to mangle git filter-branch
to my liking, I decided to look at at a suggestion I saw elsewhere: git-filter-repo
.
"Installing" git-filter-repo using Docker
git-filter-repo
isn't built-in to git itself. In fact, it's a single Python file, but it's written to feel like a git plugin. And the really nice thing is that the API is so much nicer. That whole git filter-branch
expression in the previous section could be rewritten with git-filter-repo
to be something like this:
git filter-repo --path-rename engine/:
I think you'll agree that's much clearer! The manual is also very good, with lots of examples.
The only problem from my point of view, is that git-filter-repo
is a Python module. Python on Windows can be problematic (even the install instructions make that clear) and while you can install Python from the Microsoft Store, I really didn't want to go through that. Docker to the rescue!
Docker is such a great use-case for something like this, where I want to quickly try a tool, and don't want to risk messing up my machine. Instead of installing Python, I'll run a Docker image that already has Python installed, map the drive to my project, and work inside the docker image!
git-filter-repo
requires Python 3.5+, so I searched for Python on Docker Hub and found the official images. The python:3
image is a bullseye (Debian 11) image, with Python 3.10 installed, which would do nicely.
I ran the following command from inside my app to pull and run the Docker image, to map the current directory to the /app
directory inside the container, set the working directory to /app
, and to start a bash
shell.
docker run --rm -it -v ${PWD}:/app -w /app python:3 /bin/bash
I now have a running Python container, but I don't have the git-filter-repo
tool installed yet. The python:3
repo uses Debian 11, and according to the git-filter-repo
install instructions, I needed to use the "backports" repository to install via apt-get
:
A repository in this context refers to the server containing all the packages used by
apt
for installation into a Linux machine. It is separate from the concept of a "git repository".
Unfortunately the backports repository isn't enabled by default in Debian 11, so I followed the instructions from the backport website to add it to the sources list, and installed the git-filter-repo
package:
# Add the backports repo to sources.list
echo 'deb http://deb.debian.org/debian bullseye-backports main' > /etc/apt/sources.list.d/backports.list
# Update the list of available packages
apt-get update
# Install git-filter-repo, adding the required /bullseye-backports suffix
apt-get install -y git-filter-repo/bullseye-backports
The logs indicated this had installed correctly, so I was ready to take it for a spin!
Using git-filter-repo to move files into a subdirectory
My first attempt to use git-filter-repo
wasn't very successful. I tried running:
git filter-repo --to-subdirectory-filter engine/
which seemed like it would do most of what I wanted, but I was presented with the following:
> git filter-repo --to-subdirectory-filter engine/
Aborting: Refusing to destructively overwrite repo history since
this does not look like a fresh clone.
(expected freshly packed repo)
Please operate on a fresh clone instead. If you want to proceed
anyway, use --force.
This is very interesting! Rewriting history is obviously a very destructive process in which you can lose work, and git-filter-repo
is doing its best to make sure you don't hurt yourself. As long as you have your work pushed to a remote git repository you should be fine, but to be safe, git-filter-repo
requires you work in a fresh clone by default.
This seemed very sensible to me, so I did as it asked, created a fresh clone, and tried again:
> git filter-repo --to-subdirectory-filter engine/
Parsed 24 commits
New history written in 2.37 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 547b073 Use alternate robots.txt
Enumerating objects: 375, done.
Counting objects: 100% (375/375), done.
Delta compression using up to 4 threads
Compressing objects: 100% (161/161), done.
Writing objects: 100% (375/375), done.
Total 375 (delta 189), reused 327 (delta 189), pack-reused 0
Completely finished after 6.32 seconds.
That's much better! As you can see from the logs, git-filter-repo
was very busy, rewriting the commits. Taking a look at the results afterwards, everything except the .git folder had been moved to the engine subfolder:
and the history (shown with gitk
here) shows that the original commits were all to the engine folder.
This is almost exactly what I want, except I wanted the .gitignore and .gitattributes to remain at the top level.
I'll come back to those strange
replace/*
tags in thegitk
image shortly
The easiest way to fix the .gitignore location was more rewriting! I ran the following command to move the .gitignore and .gitattributes files back up to the root folder:
> git filter-repo \
--path-rename engine/.gitattributes:.gitattributes \
--path-rename engine/.gitignore:.gitignore
Parsed 24 commits
New history written in 1.35 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at f554e31 Use alternate robots.txt
fatal: replace depth too high for object 8027f9f8670e3da4762099d39e733bcfa44fea39
fatal: failed to run pack-refs
Completely finished after 2.45 seconds.
That appeared to work, as I now had the folder structure I wanted. But there were two slightly worrying fatal
error messages in the logs 🤔 On top of that, when I tried opening gitk
I got the following error message:
That's a bit concerning 😟 Luckily, after a bit of Googling, I found I could fix the issue by running:
> git replace -d 8027f9f8670e3da4762099d39e733bcfa44fea39
Deleted replace ref '8027f9f8670e3da4762099d39e733bcfa44fea39'
After that, I could successfully open gitk
, and could see that the .gitignore and .gitattributes files were again in the root, with everything else in the engine folder:
So with that, my work was pretty much done. But that fatal
error was bugging me, as were all those extraneous replace/
refs.
It took me a little while to work out what those refs even were but eventually I pinned it down to a
git
feature calledgit-replace
. That feature is worth a whole blog post on its own, so for now I'll just point you to the docs if you're interested, and I'll walk through the feature in a subsequent post.
I decided to start again, and this time I told git-filter-repo
I didn't need the extra replace/
references by passing --replace-refs delete-no-add
:
# Move everything to the engine/ subfolder
git filter-repo --replace-refs delete-no-add --to-subdirectory-filter engine/
# Move .gitignore and .gitattributes back to the root
git filter-repo --replace-refs delete-no-add \
--path-rename engine/.gitattributes:.gitattributes \
--path-rename engine/.gitignore:.gitignore
This time there were no fatal
errors in the logs, gitk
opened without any errors, and all the replace/
references were gone. Success! With that I could exit
the Docker container, double check everything was correct, and do a git push origin --force-with-lease
of my newly rewritten repo!
All in all, I'm very impressed with git-filter-repo
, and using it inside the Docker container is clean and painless, so I'd definitely recommend it!
Summary
In this post I described a scenario where I wanted to rewrite the history of a git repository to make it appear as though some files were originally created in a sub-folder instead of the root folder. I described how to run a python:3
Docker container, how to install git-filter-repo
, and the commands required to move all the files except .gitattributes and .gitignore to an engine subfolder. To make it simpler, I've reproduced the main steps here:
- Create a fresh clone of your repository, and
cd
to the clone directory
# Clone my/repo to output_directory
git clone https://github.com/my/repo output_directory
cd output_directory
- Run a
python:3
Docker container interactively, and installgit-filter-repo
inside it
# run the Docker container
docker run --rm -it -v ${PWD}:/app -w /app python:3 /bin/bash
# inside the container, install git-filter-repo
# Add the backports repo to sources.list
echo 'deb http://deb.debian.org/debian bullseye-backports main' > /etc/apt/sources.list.d/backports.list
# Update the list of available packages
apt-get update
# Install git-filter-repo, adding the required /bullseye-backports suffix
apt-get install -y git-filter-repo/bullseye-backports
- Run the
git-filter-repo
commands to move all the files to the engine subdirectory, and then move the .gitignore and .gitattribute files back. Don't createreplace/
refs.
# Move everything to the engine/ subfolder
git filter-repo --replace-refs delete-no-add --to-subdirectory-filter engine/
# Move .gitignore and .gitattributes back to the root
git filter-repo --replace-refs delete-no-add \
--path-rename engine/.gitattributes:.gitattributes \
--path-rename engine/.gitignore:.gitignore