Understand diff in Unix

diff is an important tool program in Unix. It is used to compare differences of two files, it is the foundation for code version control. If you type :

$ diff <file_before_change> <file_after_change>

diff will tell you what's the difference between these two files. The result may not be so easy to understand, so now I will show you how to understand diff.

1. 3 formats of diff

diff has 3 formats due to historic reasons.

normal diff
context diff
unified diff

2. Demo files

For easy demonstration, we create 2 demo files.

The first one is f1, it has 7 lines of a in it.

The second file is f2, it has a b in line 4. Others are the same

3. Normal diff

Now we compare these 2 files

$ diff f1 f2

The normal result will be

ã€€ 4c4
ã€€ã€€< a
ã€€ã€€---
ã€€ã€€> b

The first line is an indicator which tells where the change happens

4c4

it consists of 3 parts : the first "4" means that f1's line 4 is changed; the middle "c" means that the change mode is content change, other modes are a(addition) and d(deletion); the trailing "4" means the changed file f2's line 4.

The second line has two parts

< a

the less than sign means that removing the line in f1, the following 'a' is the content of the line removed

The third line is to separate f1 and f2

---

The fourth line is similar to second line

> b

the greater than sign means addition of line and "b" is the content in the line added

The earliest Unix (AT&T Unix) has this version of diff.

4. Context diff

In 80's, The BSD version of Unix thought the normal diff result was too simple. It's better to add the surrounding context so that we can know the change location. So we have the context diff.

We need to add the -c option to the diff command

$ diff -c f1 f2

The result will be

ã€€ *** f1 2012-08-29 16:45:41.000000000 +0800
ã€€ã€€--- f2 2012-08-29 16:45:51.000000000 +0800
ã€€ã€€***************
ã€€ã€€*** 1,7 ****
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€!a
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€--- 1,7 ----
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€!b
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€ a

The result has four parts

The first part consists the first two lines, it shows the basic file information, including file name and timestamp

*** f1 2012-08-29 16:45:41.000000000 +0800
ã€€ã€€--- f2 2012-08-29 16:45:51.000000000 +0800

"***" indicates the file before change,"---" indicates the file after change

The second part consists 15 "*", it separates the file information from the changed context

***************

The third part is the file before change

*** 1,7 ****
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€!a
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€ a

This not only shows the changed line, it also show 3 lines in the file before the changed line and 3 lines after the changed line, so it will show 7 lines in total. "*** 1,7 ****" shows starting from line 1 to line 7

Also, there is a mark character at the beginning of each line. If it;s blank, then that line has no change, if it;s a '!', then the line has changes, if it's a '-', then the line is deleted, if it's a '+', then the line is newly added.

The fourth part is the file after change

--- 1,7 ----
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€!b
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€ a

it's similar to the third part.

5. Unified diff

If two files are very similar, then the context diff will show many replicated contentx. In 1999, GNU diff propoesed the Unified diff, it merges contents of f1 and f2.

We add a -u option to diff

$ diff -u f1 f2

The result

--- f1 2012-08-29 16:45:41.000000000 +0800
ã€€ã€€+++ f2 2012-08-29 16:45:51.000000000 +0800
ã€€ã€€@@ -1,7 +1,7 @@
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€-a
ã€€ã€€+b
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€ a

The first part is still the file information

--- f1 2012-08-29 16:45:41.000000000 +0800
ã€€ã€€+++ f2 2012-08-29 16:45:51.000000000 +0800

"---" indicates file before change,"+++" indicates file after change

The second part uses two @ to indicate the change location

@@ -1,7 +1,7 @@

the -1,7 consists of 3 parts, '-' indicates the first file (f1), '1' indicates line 1,'7' indicates consecutive 7 lines. The meaning of this expression is that the following is the 7 lines of the first file starting from line 1.

"+1,7" means 7 lines of the second file starting from line 1.

The third part shows the content changes

a
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€-a
ã€€ã€€+b
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€ a

except the changes, it will also show 3 lines in the file before and after the change. It will merge the surrounding content, that's why it is called unified diff. There is a mark character at the beginning of each line. If it's blank, then no change, if it's '-', then the line is deleted, if it's '+', then the line is added.

6. Git diff

The version control system git uses a extended unified diff

$ git diff

The result

diff --git a/f1 b/f1
ã€€ã€€index 6f8a38c..449b072 100644
ã€€ã€€--- a/f1
ã€€ã€€+++ b/f1
ã€€ã€€@@ -1,7 +1,7 @@
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€-a
ã€€ã€€+b
ã€€ã€€ a
ã€€ã€€ a
ã€€ã€€ a

The first line means this is git diff.

diff --git a/f1 b/f1

It says that comparing a version of f1(before change) and b version of f1(after change)

The second line shows the hash value of these two versions of f1 (6f8a38c and 449b072). The last 6 digits are the file mode and file permission

The following two lines show the files compared

--- a/f1
ã€€ã€€+++ b/f1

The rest lines are the same as official unified diff

7. Reading materials

ã€€ * diff - Wikipedia

ã€€ã€€* How to read a patch or diff

ã€€ã€€* How to work with diff representation in git

Original author : é˜®ä¸€å³° Source : http://www.ruanyifeng.com/blog/2012/08/how_to_read_diff.html