Find-replace within hierarchy of XML files

Question

I have around 350 XML files spread throughout the /abc directory. I would like to find all instances where the value of the alt attribute is exactly 'blah blah':

<image alt="blah blah" src="../webcontent/filename.png">
    <caption>
        Figure 1.1: Typical Components of Blah Blah
    </caption>
</image>

and replace the value of the alt attribute with the contents enclosed by the caption (removing newlines)

<image alt="Figure 1.1: Typical Components of Blah Blah" src="../webcontent/filename.png">
    <caption>
        Figure 1.1: Typical Components of Blah Blah
    </caption>
</image>

I'm open to running a script on Ubuntu or Windows, or using any text editing tool.

It is not safe to assume that newlines and indentation are consistent. Also, not all images have a caption. All XML documents in the path are well-formed.

Is there a simple way to script this replacement in-place? I'd be open to something that works for a single file; I can extend it to run recursively.

Michael Kay · Accepted Answer · 2016-06-18T08:42:52.057

For a single file, the following XSLT stylesheet will do the job:

<t:transform version="1.0" xmlns:t="http://www.w3.org/1999/XSL/Transform">
  <t:template match="node()|@*">
    <t:copy>
      <t:apply-templates select="node()|@*"/>
    </t:copy>
  </t:template>
  <t:template match="image/@alt[. = 'blah blah']">
    <t:attribute name="alt" select="normalize-space(../caption)"/>
  </t:template>
</t:transform>

To process multiple files, you can invoke the stylesheet multiple times from some shell script, Ant script, or similar (or look at xmlsh), or if you're using an XSLT 2.0 processor such as Saxon, you can script it within XSLT itself using the collection() function

score 1 · Answer 2 · answered Jun 21 '16 at 08:54

1

You could also use xmlstarlet:

xmlstarlet ed -u '//image/@alt[.= "blah blah"]' -x "normalize-space(../caption/text())"

answered Jun 21 '16 at 08:54

Michael Vehrs

275

Find-replace within hierarchy of XML files

2 Answers2