
In this post I look at the git range-diff
feature, available from git 2.19. I describe how it is meant to work, explain the output format, and demonstrate my attempt to test it with a small app.
What is git range-diff
?
I'm a big fan of git rebase
for moving commits around, for cleaning up iterative work, for working with stacked branches, and for including recent changes on main
into my feature branch.
However, there's no denying that things can sometimes get confusing with git rebase
. It's true that pretty much everything you do with git rebase
can be reversed if you make a mistake, but it's not always obvious whether there is a mistake. This is particularly true when you've done a lot of interactive rebasing, when the branch you're rebasing onto has changed a lot, or when you have to resolve a bunch of merge conflicts.
There are various sub-optimal approaches you could use to try to understand if a rebase is "correct", but from my experience none of these work very well, either because they're confusing, hard to do, or simply don't produce an easy to understand result.
git range-diff
is meant to help with exactly this scenario. It can be used to compare two ranges of commits (compared to git diff
which compares the state at two different commits directly). It's best to think of git range-diff
as performing a diff
of two git-diff
s—because that's literally how it works behind the scenes!
In theory, this makes it possible to compare a stack of commits prior to rebasing with the stack of commits after rebasing and to show the differences between them. If the rebase was simply rearranging and squashing commits then you would expect the diffs to be identical, and the diff of diffs would show that.
On the other hand, if you had to handle merge conflicts as part of the rebase, or if you rebased onto a different commit, then you might expect there to be changes, and these would be shown by git range-diff
Using git range-diff
At the end of the post I'll show a real worked example of git range-diff
, but the basic syntax (in the most useful form IMO) is as follows:
git range-diff base1..head1 base2..head2
where you have a git commit tree that looks something the following
h-i-head2
/
a-b-c-base1-d-e-base2
\
f-g-head1
So for the example above, git range-diff
essentially does the following:
- Perform a
git diff base1..head1
(i.e.base1
,f
,g
, andhead1
), and generate a "patch" - Perform a
git diff base2..head2
(i.e.base2
,h
,i
, andhead2
), and generate a "patch" - Perform a
git diff
between the two patches
git range-diff
Compares the content of the files, but then also compares the order of the commits, and the metadata in those commits, such as the commit messages. The resulting output is, frankly, quite confusing, so I'll walk through what it means in the next section.
Understanding the output format
As mentioned above, the output format of git range-diff
can be pretty hard to understand in my experience. The image below is taken from a GitHub blog post and shows a relatively simple example of the expected output
The output shows the series of commits that are being compared, and indicates whether commits have been reordered, and shows any difference in the commits between the two ranges. We'll dig more into the "diff" part shortly, but first let's just look at the commit list itself. I've added a header so we can describe each section subsequently:
1| 2 |3|4| 5 | 6
------------------------------
2: 8d6b31f = 1: d672a8f add README.md
1: 3386b9a = 2: 02c0d21 add hello/goodbye world
3: bc293cc ! 3: 251b232 hello: fix typo
-: ------- > 4: a835e18 goodbye: add missing newline
This output shows how the commits in the two branches have changed, been reordered, been added or removed. From left-to-right, these sections are as follows:
- The position of the left-hand side commit in the commit range being compared. For example,
2
indicates it was the second commit in the commit list.-
indicates no matching commit was found in the left hand side commit range. - The short commit hash for the left-hand commit.
- The equality of the commits being compared.
=
indicates they are equal,!
means they were different,>
means it was only in the right-hand side,<
means it was only on the left-hand side. - The position of the right-hand side commit in the commit range being compared. In the example above, you can see that the second commit on the left was matched to the second commit on the right.
- The short commit hash for the right-hand commit. Even for "equal" commits, these hashes are likely to differ as the commits may have different parents.
- The commit message for the commits. If the commits differ, the right-hand-side message is shown and a diff for the commit message is shown.
In the image above, we can see that the order of the first two commits have been swapped by comparing the commit orders. The fourth commit on the right-hand side was not present in the left-hand side range.
The third commit is where things get interesting. The commit list shows that the commit on the left and right were matched, but not equal (indicated by !
). Whenever you have inequality like this, git range-diff
provides a diff of what changed.
The first part of the difference was in the commit message, which shows:
@@ -2,7 +2,7 @@
hello: fix typo
- "Hello" has two l's.
+ "Hello" has two l's. Let's also fix the missing newline.
This is a pretty standard git diff
format between the left- and right-hand side commits. It indicates that we added the text Let's also fix the missing newline.
to the commit message.
Where things get more complicated is in comparing the diffs of the commit. Remember, git range-diff
is a "diff of diffs", and diffs already contain -
and +
prefixes at the start of the line. So git range-diff
has double prefixes: 😅
diff --git a/hello.c b/hello.c
--- a/hello.c
@@ -12,6 +12,6 @@
int main(void)
{
- printf("Helo world");
-+ printf("Hello world");
++ printf("Hello world\n");
return 0;
}
Let's think about what this means. Only the left-hand most column, i.e. left-most prefix, indicates a difference between the two sides. The first line with any prefix only has a single -
character:
- printf("Helo world");
That means that both commits removed this line, so there's no difference_ between the commit ranges for this line. In contrast, the next two lines have two prefix symbols:
-+ printf("Hello world");
++ printf("Hello world\n");
The first of these lines starts with -+
, which means the left-hand side commit was adding this commit, but the right hand side no longer is. Conversely, the ++
indicates the right-hand side is newly adding this line.
Remembering these suffixes is difficult, so the following table describes what the symbols mean
Prefix symbol | Left-hand commit | Right-hand commit |
---|---|---|
(None) | No change | No change |
- | Removed the line | Removed the line |
+ | Added the line | Added the line |
+- | No change | Removes the line |
++ | No change | Adds the line |
-- | Removed the line | "Removes the removal", so no longer removes the line. i.e. it adds the line back that the left hand side removed. |
-+ | Added the line | "Removes the add", so no longer adds the line. i.e. it removes the line that the left hand side added. |
I think it's quite difficult to intuitively remember the behaviours here. The best I can achieve is working through the logic of the above table 😅
The git range-diff
output does try to help guide your understanding by only highlighting the first character in the prefix, i.e. lines that are actually different between left and right, as opposed to changes that appear in both diffs.
Note that if you only want to see the changes in the commit list, and don't want to see the full diff then you can use the -s
or --no-patch
arguments, Using something like the following:
git range-diff --no-patch base1..head1 base2..head2
# or
git range-diff -s base1..head1 base2..head2
In the next section I show the experience I found testing out git range-diff
on a small sample.
Trying it out in a small sample
Whenever I discover a new git feature, I like to try it out in small samples to try to get an initial feeling for it. I then expand to larger projects later once I have a handle on how the feature works. Unfortunately, for git range-diff
I was not very impressed with what I found. I think that's partly just because the output of git range-diff
is relatively hard to parse. But there are also some aspects that seem generally ill-suited to the very small diffs I've used in this example.
Setting up a simple test app
The test app I used is a vey simple minimal API project. I created the initial project as follows:
git init
dotnet new web
dotnet new gitignore
git commit -m "Initial Commit"
This generates the default minimal API app, as follows:
var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();
app.MapGet("/", () => "Hello world!");
app.Run();
From there, I simulated work occurring both in a feature branch and directly on main
.
Parallel work streams on my_feature
and main
In general I'm pretty comfortable using git rebase
to interactively squash, rearrange, and split commits. I use it daily to tidy up PRs before pushing them for review, and when working with stacked PRs. But one area where I'm often lacking confidence is after rebasing onto main
and having to deal with merge conflicts. That seems like the really killer app for git range-diff
in theory, so I set out to see what that would look like.
I started by creating a branch called my_feature
. I then made 5 commits to this branch making trivial changes:
- Added an
Example(string, string)
record in theExample.cs
file. - Added an
/example
endpoint. - Updated the
/example
endpoint to return anExample
instance. - Updated the
/
(hello world) endpoint to return anExample
instance instead of"Hello world!"
. - Reverted the previous change to return
"Hello World"
(note the missing!
)
After all those commits (which were intentionally circuitous), Program.cs looks like this:
var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();
app.MapGet("/", () => "Hello World");
app.MapGet("/example", () => new Example("Example 1", "The first example"));
app.Run();
I then switched back to main
, and added an endpoint that lets you post a name
to add to a dictionary. The /
(hello world) endpoint was then updated to say Hello to each of the names:
using System.Collections.Concurrent;
var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();
var names = new ConcurrentDictionary<string, string>();
app.MapGet("/", () => names.Keys.Select(n => $"Hello {n}!"));
app.MapPost("/{name}", (string name) => names[name] = name);
app.Run();
Functionality wise, the changes on main mean you can make requests like this:
POST http://localhost:5116/James
and then when you hit /
you get a result like this:
["Hello James!","Hello Andrew!","Hello Chris!","Hello David!"]
At this point, our commit graph looks like the following:
Rebasing the feature branch
We now want to rebase the my_feature
branch on top of main
instead of base
. we start by creating a "backup" branch, to make it easy to revert if we run into any issues by running
git checkout my_feature
git branch my_feature_bak # Create a backup pointing to the same location
For the rebase, we can run the following:
git rebase base --onto main --no-update-refs
Not the use of
--no-update-refs
here so that we don't accidentally rebasemy_feature_bak
at the same time. This is only necessary if you enable--update-refs
by default.
Unfortunately, we have a bunch of merge conflicts to contend with. There are the easy conflicts, where we're adding logically distinct endpoints but in conflicting locations in the file. Then there are the difficult merge conflicts, where we actually have modified the same logical code. That's primarily the Hello world endpoint which we initially modified in the my_feature
branch and then (partially) reverted.
Once we've fixed all the conflicts, we'll have a commit tree that looks something like the following:
The final code in my_feature
looks like this:
using System.Collections.Concurrent;
var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();
var names = new ConcurrentDictionary<string, string>();
app.MapGet("/", () => names.Keys.Select(n => $"Hello {n}!"));
app.MapPost("/{name}", (string name) => names[name] = name);
app.MapGet("/example", () => new Example("Example 1", "The first example"));
app.Run();
So now we come to the important part—what does git range-diff
make of it?
Verifying the merge conflict resolution with git range-diff
To compare the branch prior to the rebase with post rebase we can use a command like the following:
git range-diff base..my_feature_bak main..my_feature
This produces an output like the following:
1: 3070585 = 1: ebd4946 Commit 1
2: 76df723 < -: ------- Commit 2
3: e526ca2 < -: ------- Commit 3
4: 64856f3 < -: ------- Commit 4
5: c96df0b < -: ------- Commit 5
-: ------- > 2: edbe245 Commit 2
-: ------- > 3: e06f56e Commit 5
or as a colourised image:
The result was somewhat surprising to me. The only commit that git
thinks is "equal" is Commit 1, which adds the Example
record in a separate file. It's particularly interesting that Commit 2
is not recognized as matching, given that the diff prior to the rebase was
var app = builder.Build();
app.MapGet("/", () => "Hello World!");
+app.MapGet("/example", () => "Example 1");
app.Run();
while post rebase it was:
app.MapGet("/", () => names.Keys.Select(n => $"Hello {n}!"));
app.MapPost("/{name}", (string name) => names[name] = name);
+app.MapGet("/example", () => "Example 1");
app.Run();
Pretty similar, no!?
The important thing to understand about range-diff
is that it doesn't just diff the -/+
lines, it's diffing the whole diff patch, including the context lines. If you look back at the previous two diffs, you can see that the existing changes on main
mean that the unchanged lines in the diffs look significantly different from one another.
The algorithm git uses has a "fudge factor" to determine how similar two diffs must be for them to be considered "equal". You can tweak this value wit the --creation-factor
, which must be a value between 0-100 (default is 60). The higher the value, the more likely git is to find matches.
For example, if we run the same comparison with --creation-factor=90
, we get a very different range-diff
(note that I'm hiding the diff patches here for brevity, the subsequent image shows the diff in all its glory)
> git range-diff base..my_feature_bak main..my_feature --creation-factor=90 -s
1: 3070585 = 1: ebd4946 Commit 1
2: 76df723 ! 2: edbe245 Commit 2
3: e526ca2 < -: ------- Commit 3
4: 64856f3 < -: ------- Commit 4
5: c96df0b ! 3: e06f56e Commit 5
The results here look much closer to what we actually expect; Commit 2
is matched in both branches for example. I think the full diff is still somewhat confusing however:
Take a moment to try to parse this output. Both of the commit diffs highlight that there are differences related to the /
and /{name}
endpoints. That's expected, because those changes were introduced in the main
branch, and so are present in the rebased branch, but not in the "prior" scenario.
Where things look a bit strange are on the first lines in the diff:
-@@ Program.cs: var builder = WebApplication.CreateBuilder(args);
- var app = builder.Build();
+@@ Program.cs: var names = new ConcurrentDictionary<string, string>();
+ app.MapGet("/", () => names.Keys.Select(n => $"Hello {n}!"));
When I first look at that diff, it looks like it's saying that the lines var app = builder.Build();
etc have been removed from the rebased commit stack, which would obviously be an error. However, it's more complicated than that. Remember git does a very crude git diff
of the commit patch, which means it's also diffing the context. That's what we have here—differences in the context.
Due to the changes in main
the context around the changed lines (/example
) are fundamentally different, and that's what's showing up here.
Unfortunately, I'm not sure there's a good fix for that. This is integral to how git range-diff
works, so I think the only answer is getting comfortable with the confusing output. The main thing I wonder is whether I'd be able to spot a genuine merge-conflict error amongst this noise. I guess time will tell, as I try this in realistic scenarios. Given that the Linux kernel uses it, I think it's safe to say it's certainly possible to use it successfully!
Summary
In this post I looked at the git range-diff
feature. I discuss the scenarios it's designed to help with, and how it works as a diff-of-diffs. Next I explained the output format it uses, which can be difficult to parse thanks to the confusion in showing a diff-of-diff patches. Finally I tried out the feature on a small toy sample in which I rebased a branch and resolved merge conflicts. In my test example, the result was very dependent on the "fudge factor" parameter, and the output was difficult to discern genuine changes from changes in the surrounding context. I suspect it may be primarily a case of needing practice to read the output on my part.