I'm mimicking a large xml file, which I'm willing to import in mediawiki.
File is done, yet content in <text>content</text> still has remaining < and > I must encode first.
I wish encoding step may be done with regex (I'm using Windows and software like sublime text or edit pad or vim). I shoud be able to run a php script as well.
Using ({{word)(.*?)(?=</text>)I was able to select all targets for replacements – as I dont want to encode the xml markup itself – but I dont know how get the hard job done, i.e. how to replace all < and > lying in the well targeted text.
For better clarity here it is a light extract of how the content where I need to encode a few caracters looks like (I have 50000 more like that in a 30 mo file) :
      <page>
    <title>Title:75002</title>
    <ns>510</ns>
    <id>21</id>
    <revision>
      <id></id>
      <parentid></parentid>
      <timestamp>2015-1-5T14:49:09Z</timestamp>
      <contributor>
        <ip>0:0:0:0:0:0:0:1</ip>
      </contributor>
      <text xmlspace="preserve" bytes="345">{{word
| vedette             ={{{vedette}}}
| id            ={{ROOTPAGENAME}}
| vedette           =boutique, with forbidden > and 
 evil < multiline
<!-----------encyclo---------->
| étymologie        = still have sometimes a messing > 
and maybe a < more.
<!-----------relations-------->
| synonyme          ={{AutoLienSyno | }}
}}</text>
      <sha1></sha1>
      <model>wikitext</model>
      <format>text/x-wiki</format>
    </revision>
  </page>
Thank you.
 
    