115

On Linux, how could I generate a diff between two XML files?

Ideally, I would like to be able configure it to some things strict, or loosen some things, like whitespace, or attribute order.

I'll often care that the files are functionally the same, but diff by itself, would be annoying to use, especially if the XML file doesn't have a lot of linebreaks.

For example, the following should really be okay to me:

<tag att1="one" att2="two">
  content
</tag>

<tag att2="two" att1="one">
  content
</tag>
slhck
  • 235,242
qedi
  • 1,681

10 Answers10

132

One approach would be to first turn both XML files into Canonical XML, and compare the results using diff. For example, xmllint can be used to canonicalize XML.

$ xmllint --c14n one.xml > 1.xml
$ xmllint --c14n two.xml > 2.xml
$ diff 1.xml 2.xml

Or as a one-liner.

$ diff <(xmllint --c14n one.xml) <(xmllint --c14n two.xml)
36

Jukka's answer did not work for me, but it did point to Canonical XML. Neither --c14n nor --c14n11 sorted the attributes, but i did find the --exc-c14n switch did sort the attributes. --exc-c14n is not listed in the man page, but described on the command line as "W3C exclusive canonical format".

$ xmllint --exc-c14n one.xml > 1.xml
$ xmllint --exc-c14n two.xml > 2.xml
$ diff 1.xml 2.xml

$ xmllint | grep c14
    --c14n : save in W3C canonical format v1.0 (with comments)
    --c14n11 : save in W3C canonical format v1.1 (with comments)
    --exc-c14n : save in W3C exclusive canonical format (with comments)

$ rpm -qf /usr/bin/xmllint
libxml2-2.7.6-14.el6.x86_64
libxml2-2.7.6-14.el6.i686

$ cat /etc/system-release
CentOS release 6.5 (Final)

Warning --exc-c14n strips out the xml header whereas the --c14n prepends the xml header if not there.

rjt
  • 1,056
23

Tried to use @Jukka Matilainen's answer but had problems with white-space (one of the files was a huge one-liner). Using --format helps to skip white-space differences.

xmllint --format one.xml > 1.xml  
xmllint --format two.xml > 2.xml  
diff 1.xml 2.xml  

Note: Use vimdiff command for side-by-side comparison of the xmls.

GuruM
  • 331
  • 2
  • 6
8

If you wish to also ignore the order of child elements, I wrote a simple python tool for this called xmldiffs:

Compare two XML files, ignoring element and attribute order.

Usage: xmldiffs [OPTION] FILE1 FILE2

Any extra options are passed to the diff command.

Get it at https://github.com/joh/xmldiffs

joh
  • 1,715
7

Diffxml gets the basic functionality correct, though it doesn't seem to offer many options for configuration.

Edit: Project Diffxml has been migrated to GitHub since 2013.

dsolimano
  • 2,906
1

My Python script xdiff.py for comparing XML files ignores differences in whitespace or attribute order (in contrast to element order).

In order to compare two files 1.xml and 2.xml, you would run the script as follows:

xdiff.py 1.xml 2.xml

In the OP's example, it would output nothing and return exit status 0 (for no structural or textual differences).

In cases where 1.xml and 2.xml differ structurally, it mimics the unified output of GNU diff and returns exit status 1. There are various options for controlling the output, such as -a for outputting all context, -n for outputting no context, and -q for suppressing output altogether (while still returning the exit status).

1

If anyone stumbles upon this and is a developer and knows programming languages, then you can also check XML difference using XMLUnit in C# or JAVA.

For checking how does it show's difference you can try this online XML difference checker tool

C# Sample Code to check difference

string control = "<a><b attr=\"abc\"></b></a>";
string test = "<a><b attr=\"xyz\"></b></a>";

var myDiff = DiffBuilder.Compare(Input.FromString(control)) .WithTest(Input.FromString(test)) .Build();

Assert.IsFalse(myDiff.HasDifferences(), myDiff.ToString());

Jyoti
  • 11
  • 1
-1

Not sure whether (the dependence of) an online tool counts as a solution but, for what it's worth, I got good result in this online XML comparison tool. It simply works.

RayLuo
  • 329
-1

Our SD Smart Differencer compares documents based on structure as opposed to actual layout.

There's an XML Smart Differencer. For XML, that means matching order of tags and content. It should note that the text string in the specific fragment you indicated was different. It presently doesn't understand the XML notion of tag attributes indicating whether whitespace is normalized vs. significant.

-1

I use Beyond Compare to compare all types of text based files. They produce versions for Windows and Linux.

Alan
  • 199