I have around 350 XML files spread throughout the /abc directory. I would like to find all instances where the value of the alt attribute is exactly 'blah blah':
<image alt="blah blah" src="../webcontent/filename.png">
<caption>
Figure 1.1: Typical Components of Blah Blah
</caption>
</image>
and replace the value of the alt attribute with the contents enclosed by the caption (removing newlines)
<image alt="Figure 1.1: Typical Components of Blah Blah" src="../webcontent/filename.png">
<caption>
Figure 1.1: Typical Components of Blah Blah
</caption>
</image>
I'm open to running a script on Ubuntu or Windows, or using any text editing tool.
It is not safe to assume that newlines and indentation are consistent. Also, not all images have a caption. All XML documents in the path are well-formed.
Is there a simple way to script this replacement in-place? I'd be open to something that works for a single file; I can extend it to run recursively.