I have a large XML file (about 37 MB—it’s the underlying XML file in a Word document of about 350 pages) that I am trying to search through with XPath. I’m doing this ‘manually’ rather than programmatically, by opening the file in an XML editor and searching there.
I’ve tried with Xmplify, QXmlEdit, SublimeText with the XPath extension, etc., and they all suffer from the same problem: just opening the file is ridiculously slow and hogs an awful lot of memory, and doing an XPath search is nigh impossible.
As an example, I just tried opening the file in Xmplify. That took about three minutes, and with no other documents open Xmplify’s memory usage rose to about 1 GB.
Then I tried to perform this XPath query (I’m looking for all tracked insertions consisting of the string en):
//w:ins[w:r/w:t = 'en']
That gave me a SPOD for a good while. After about 15 minutes of going at around 100% CPU, Xmplify was now using 60 GB of memory, and my OS was telling me that I had run out of application memory and needed to start force-quitting stuff.
That seems rather excessive to me for a single XPath query in a single file, even if it is a fairly big file. The other applications I’ve tried using were not as egregiously bad, but opening the document and running any kind of XPath query still took minutes, and their memory usages were countable in GB too, so it’s not just Xmplify being inefficient.
What is the reason for this? Why is XPath (apparently) so resource-intensive? Does it differ between OSes (mine is macOS Sierra)?
I debated whether to post this here or on StackOverflow, but since I’m specifically not doing this programmatically, I decided this was probably the better place. Feel free to migrate if there’s a better Stack for it.