Verifying tricky git rebases with git range-diff

In this post I look at the git range-diff feature, available from git 2.19. I describe how it is meant to work, explain the output format, and demonstrate my attempt to test it with a small app.

What is `git range-diff`?

I'm a big fan of git rebase for moving commits around, for cleaning up iterative work, for working with stacked branches, and for including recent changes on main into my feature branch.

However, there's no denying that things can sometimes get confusing with git rebase. It's true that pretty much everything you do with git rebase can be reversed if you make a mistake, but it's not always obvious whether there is a mistake. This is particularly true when you've done a lot of interactive rebasing, when the branch you're rebasing onto has changed a lot, or when you have to resolve a bunch of merge conflicts.

There are various sub-optimal approaches you could use to try to understand if a rebase is "correct", but from my experience none of these work very well, either because they're confusing, hard to do, or simply don't produce an easy to understand result.

git range-diff is meant to help with exactly this scenario. It can be used to compare two ranges of commits (compared to git diff which compares the state at two different commits directly). It's best to think of git range-diff as performing a diff of two git-diffs—because that's literally how it works behind the scenes!

In theory, this makes it possible to compare a stack of commits prior to rebasing with the stack of commits after rebasing and to show the differences between them. If the rebase was simply rearranging and squashing commits then you would expect the diffs to be identical, and the diff of diffs would show that.

On the other hand, if you had to handle merge conflicts as part of the rebase, or if you rebased onto a different commit, then you might expect there to be changes, and these would be shown by git range-diff

Using `git range-diff`

At the end of the post I'll show a real worked example of git range-diff, but the basic syntax (in the most useful form IMO) is as follows:

git range-diff base1..head1 base2..head2

where you have a git commit tree that looks something the following

                    h-i-head2
                   /
a-b-c-base1-d-e-base2
        \
         f-g-head1

So for the example above, git range-diff essentially does the following:

Perform a git diff base1..head1 (i.e. base1, f, g, and head1), and generate a "patch"
Perform a git diff base2..head2 (i.e. base2, h, i, and head2), and generate a "patch"
Perform a git diff between the two patches

git range-diff Compares the content of the files, but then also compares the order of the commits, and the metadata in those commits, such as the commit messages. The resulting output is, frankly, quite confusing, so I'll walk through what it means in the next section.

Understanding the output format

As mentioned above, the output format of git range-diff can be pretty hard to understand in my experience. The image below is taken from a GitHub blog post and shows a relatively simple example of the expected output

A git range-diff output

The output shows the series of commits that are being compared, and indicates whether commits have been reordered, and shows any difference in the commits between the two ranges. We'll dig more into the "diff" part shortly, but first let's just look at the commit list itself. I've added a header so we can describe each section subsequently:

1|     2  |3|4|     5   |    6
------------------------------
2: 8d6b31f = 1:  d672a8f add README.md
1: 3386b9a = 2:  02c0d21 add hello/goodbye world
3: bc293cc ! 3:  251b232 hello: fix typo
-: ------- > 4:  a835e18 goodbye: add missing newline

This output shows how the commits in the two branches have changed, been reordered, been added or removed. From left-to-right, these sections are as follows:

The position of the left-hand side commit in the commit range being compared. For example, 2 indicates it was the second commit in the commit list. - indicates no matching commit was found in the left hand side commit range.
The short commit hash for the left-hand commit.
The equality of the commits being compared. = indicates they are equal, ! means they were different, > means it was only in the right-hand side, < means it was only on the left-hand side.
The position of the right-hand side commit in the commit range being compared. In the example above, you can see that the second commit on the left was matched to the second commit on the right.
The short commit hash for the right-hand commit. Even for "equal" commits, these hashes are likely to differ as the commits may have different parents.
The commit message for the commits. If the commits differ, the right-hand-side message is shown and a diff for the commit message is shown.

In the image above, we can see that the order of the first two commits have been swapped by comparing the commit orders. The fourth commit on the right-hand side was not present in the left-hand side range.

The third commit is where things get interesting. The commit list shows that the commit on the left and right were matched, but not equal (indicated by !). Whenever you have inequality like this, git range-diff provides a diff of what changed.

The first part of the difference was in the commit message, which shows:

@@ -2,7 +2,7 @@
     hello: fix typo

-   "Hello" has two l's.
+   "Hello" has two l's. Let's also fix the missing newline.

This is a pretty standard git diff format between the left- and right-hand side commits. It indicates that we added the text Let's also fix the missing newline. to the commit message.

Where things get more complicated is in comparing the diffs of the commit. Remember, git range-diff is a "diff of diffs", and diffs already contain - and + prefixes at the start of the line. So git range-diff has double prefixes: 😅

diff --git a/hello.c b/hello.c
--- a/hello.c
@@ -12,6 +12,6 @@
    int main(void)
    {
 -      printf("Helo world");
-+      printf("Hello world");
++      printf("Hello world\n");
        return 0;
    }

Let's think about what this means. Only the left-hand most column, i.e. left-most prefix, indicates a difference between the two sides. The first line with any prefix only has a single - character:

 -      printf("Helo world");

That means that both commits removed this line, so there's no difference_ between the commit ranges for this line. In contrast, the next two lines have two prefix symbols:

-+      printf("Hello world");
++      printf("Hello world\n");

The first of these lines starts with -+, which means the left-hand side commit was adding this commit, but the right hand side no longer is. Conversely, the ++ indicates the right-hand side is newly adding this line.

Remembering these suffixes is difficult, so the following table describes what the symbols mean

Prefix symbol	Left-hand commit	Right-hand commit
(None)	No change	No change
`-`	Removed the line	Removed the line
`+`	Added the line	Added the line
`+-`	No change	Removes the line
`++`	No change	Adds the line
`--`	Removed the line	"Removes the removal", so no longer removes the line. i.e. it adds the line back that the left hand side removed.
`-+`	Added the line	"Removes the add", so no longer adds the line. i.e. it removes the line that the left hand side added.

I think it's quite difficult to intuitively remember the behaviours here. The best I can achieve is working through the logic of the above table 😅

The git range-diff output does try to help guide your understanding by only highlighting the first character in the prefix, i.e. lines that are actually different between left and right, as opposed to changes that appear in both diffs.

Note that if you only want to see the changes in the commit list, and don't want to see the full diff then you can use the -s or --no-patch arguments, Using something like the following:

git range-diff --no-patch base1..head1 base2..head2
# or
git range-diff -s base1..head1 base2..head2

In the next section I show the experience I found testing out git range-diff on a small sample.

Trying it out in a small sample

Whenever I discover a new git feature, I like to try it out in small samples to try to get an initial feeling for it. I then expand to larger projects later once I have a handle on how the feature works. Unfortunately, for git range-diff I was not very impressed with what I found. I think that's partly just because the output of git range-diff is relatively hard to parse. But there are also some aspects that seem generally ill-suited to the very small diffs I've used in this example.

Setting up a simple test app

The test app I used is a vey simple minimal API project. I created the initial project as follows:

git init
dotnet new web
dotnet new gitignore
git commit -m "Initial Commit"

This generates the default minimal API app, as follows:

var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();

app.MapGet("/", () => "Hello world!");

app.Run();

From there, I simulated work occurring both in a feature branch and directly on main.

Parallel work streams on `my_feature` and `main`

In general I'm pretty comfortable using git rebase to interactively squash, rearrange, and split commits. I use it daily to tidy up PRs before pushing them for review, and when working with stacked PRs. But one area where I'm often lacking confidence is after rebasing onto main and having to deal with merge conflicts. That seems like the really killer app for git range-diff in theory, so I set out to see what that would look like.

I started by creating a branch called my_feature. I then made 5 commits to this branch making trivial changes:

Added an Example(string, string) record in the Example.cs file.
Added an /example endpoint.
Updated the /example endpoint to return an Example instance.
Updated the / (hello world) endpoint to return an Example instance instead of "Hello world!".
Reverted the previous change to return "Hello World" (note the missing !)

After all those commits (which were intentionally circuitous), Program.cs looks like this:

var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();

app.MapGet("/", () => "Hello World");
app.MapGet("/example", () => new Example("Example 1", "The first example"));

app.Run();

I then switched back to main, and added an endpoint that lets you post a name to add to a dictionary. The / (hello world) endpoint was then updated to say Hello to each of the names:

using System.Collections.Concurrent;

var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();
var names = new ConcurrentDictionary<string, string>();

app.MapGet("/", () => names.Keys.Select(n => $"Hello {n}!"));

app.MapPost("/{name}", (string name) => names[name] = name);

app.Run();

Functionality wise, the changes on main mean you can make requests like this:

POST http://localhost:5116/James

and then when you hit / you get a result like this:

["Hello James!","Hello Andrew!","Hello Chris!","Hello David!"]

At this point, our commit graph looks like the following:

The git commit graph after the inital commits

Rebasing the feature branch

We now want to rebase the my_feature branch on top of main instead of base. we start by creating a "backup" branch, to make it easy to revert if we run into any issues by running

git checkout my_feature
git branch my_feature_bak # Create a backup pointing to the same location

Creating a backup branch

For the rebase, we can run the following:

git rebase base --onto main --no-update-refs

Not the use of --no-update-refs here so that we don't accidentally rebase my_feature_bak at the same time. This is only necessary if you enable --update-refs by default.

Unfortunately, we have a bunch of merge conflicts to contend with. There are the easy conflicts, where we're adding logically distinct endpoints but in conflicting locations in the file. Then there are the difficult merge conflicts, where we actually have modified the same logical code. That's primarily the Hello world endpoint which we initially modified in the my_feature branch and then (partially) reverted.

Once we've fixed all the conflicts, we'll have a commit tree that looks something like the following:

The final commits

The final code in my_feature looks like this:

using System.Collections.Concurrent;

var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();
var names = new ConcurrentDictionary<string, string>();

app.MapGet("/", () => names.Keys.Select(n => $"Hello {n}!"));

app.MapPost("/{name}", (string name) => names[name] = name);
app.MapGet("/example", () => new Example("Example 1", "The first example"));

app.Run();

So now we come to the important part—what does git range-diff make of it?

Verifying the merge conflict resolution with `git range-diff`

To compare the branch prior to the rebase with post rebase we can use a command like the following:

git range-diff base..my_feature_bak main..my_feature

This produces an output like the following:

1:  3070585 = 1:  ebd4946 Commit 1
2:  76df723 < -:  ------- Commit 2
3:  e526ca2 < -:  ------- Commit 3
4:  64856f3 < -:  ------- Commit 4
5:  c96df0b < -:  ------- Commit 5
-:  ------- > 2:  edbe245 Commit 2
-:  ------- > 3:  e06f56e Commit 5

or as a colourised image:

The range-diff for the sample app

The result was somewhat surprising to me. The only commit that git thinks is "equal" is Commit 1, which adds the Example record in a separate file. It's particularly interesting that Commit 2 is not recognized as matching, given that the diff prior to the rebase was

 var app = builder.Build();
 
 app.MapGet("/", () => "Hello World!");
+app.MapGet("/example", () => "Example 1");
 
 app.Run();

while post rebase it was:

 app.MapGet("/", () => names.Keys.Select(n => $"Hello {n}!"));
 
 app.MapPost("/{name}", (string name) => names[name] = name);
+app.MapGet("/example", () => "Example 1");
 
 app.Run();

Pretty similar, no!?

The important thing to understand about range-diff is that it doesn't just diff the -/+ lines, it's diffing the whole diff patch, including the context lines. If you look back at the previous two diffs, you can see that the existing changes on main mean that the unchanged lines in the diffs look significantly different from one another.

The algorithm git uses has a "fudge factor" to determine how similar two diffs must be for them to be considered "equal". You can tweak this value wit the --creation-factor, which must be a value between 0-100 (default is 60). The higher the value, the more likely git is to find matches.

For example, if we run the same comparison with --creation-factor=90, we get a very different range-diff (note that I'm hiding the diff patches here for brevity, the subsequent image shows the diff in all its glory)

>  git range-diff base..my_feature_bak main..my_feature --creation-factor=90 -s
1:  3070585 = 1:  ebd4946 Commit 1
2:  76df723 ! 2:  edbe245 Commit 2
3:  e526ca2 < -:  ------- Commit 3
4:  64856f3 < -:  ------- Commit 4
5:  c96df0b ! 3:  e06f56e Commit 5

The results here look much closer to what we actually expect; Commit 2 is matched in both branches for example. I think the full diff is still somewhat confusing however:

The range-diff with --creation-factor=90

Take a moment to try to parse this output. Both of the commit diffs highlight that there are differences related to the / and /{name} endpoints. That's expected, because those changes were introduced in the main branch, and so are present in the rebased branch, but not in the "prior" scenario.

Where things look a bit strange are on the first lines in the diff:

-@@ Program.cs: var builder = WebApplication.CreateBuilder(args);
- var app = builder.Build();
+@@ Program.cs: var names = new ConcurrentDictionary<string, string>();
+ app.MapGet("/", () => names.Keys.Select(n => $"Hello {n}!"));

When I first look at that diff, it looks like it's saying that the lines var app = builder.Build(); etc have been removed from the rebased commit stack, which would obviously be an error. However, it's more complicated than that. Remember git does a very crude git diff of the commit patch, which means it's also diffing the context. That's what we have here—differences in the context.

Due to the changes in main the context around the changed lines (/example) are fundamentally different, and that's what's showing up here.

Unfortunately, I'm not sure there's a good fix for that. This is integral to how git range-diff works, so I think the only answer is getting comfortable with the confusing output. The main thing I wonder is whether I'd be able to spot a genuine merge-conflict error amongst this noise. I guess time will tell, as I try this in realistic scenarios. Given that the Linux kernel uses it, I think it's safe to say it's certainly possible to use it successfully!

Summary

In this post I looked at the git range-diff feature. I discuss the scenarios it's designed to help with, and how it works as a diff-of-diffs. Next I explained the output format it uses, which can be difficult to parse thanks to the confusion in showing a diff-of-diff patches. Finally I tried out the feature on a small toy sample in which I rebased a branch and resolved merge conflicts. In my test example, the result was very dependent on the "fudge factor" parameter, and the output was difficult to discern genuine changes from changes in the surrounding context. I suspect it may be primarily a case of needing practice to read the output on my part.

Verifying tricky git rebases with git range-diff

What is `git range-diff`?

Using `git range-diff`

Understanding the output format

Trying it out in a small sample

Setting up a simple test app

Parallel work streams on `my_feature` and `main`

Rebasing the feature branch

Verifying the merge conflict resolution with `git range-diff`

Summary

Trending Articles

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

NCERT Solutions for Class 9th Sanskrit Chapter 2 अविवेकः परमापदां पदम्

Ifield Avenue closed following crash in Langley Green

Practice Sheet of Right form of verbs for HSC Students

S.K. Macharia Biography, Wealth, Awards, Family, Wife and Children

TASK ERROR: storage migration failed: block job (mirror) error:...

Electronic Bank Statement field Assignment (ZUONR) missing alphabets from...

गर्मी पर स्टेटस – Funny Summer Status in Hindi for Whatsapp

Forum Post: RE: TMS570LC4357: Disable error pin output for ESM group 1, 2, 3

newbie need guide - help - read flash xc2287-96F with dap miniwiggler

Karimnagar District Police Office Mobile Numbers List in Telangana State

Griffith faces three more offences

Shatta Wale – You Shock Me (Prod. by Willis Beatz)

09g927750** 6 speed transmission TCM VAG original firmware files

Parris out on $9,000 bail

More things we have to put up with: when NOT to raise hell with Disclosure

Karnataka SSLC 10th Exam Time Table 2016 (www.kseeb.kar.nic.in)

The 10 Wyoming Cities With The Largest Black Population For 2021

PSM I question: Product Backlog item considered complete

Scripting Tracker - Development Tool for SAP GUI Scripting