I got English sentences whose words are XML-tagged, for example:
<XXX>word1</XXX> word2 word3 <YYY>word4 word5 word6</YYY> word7 word8 word9 word10 <ZZZ>word11 word12</ZZZ>.
There are exactly those three possibilities for xml tags as the sentence shows (<XXX>, <YYY>, <ZZZ>). The word count inside any of those tags can be infinite.
I need to split them at whitespaces ignoring whitespaces inside those XML tags. The code looks like:
String mySentence = "<XXX>word1</XXX> word2 word3 <YYY>word4 word5 word6</YYY> word7 word8 word9 word10 <ZZZ>word11 word12</ZZZ>.";
String[] mySentenceSplit = mySentence.split("someUnknownRegex");
for (int i = 0; i < mySentenceSplit.length; i++) {
    System.out.println(mySentenceSplit[i]);
}
Specifically for the example above the output should be like:
mySentenceSplit[0] = <XXX>word1</XXX>
mySentenceSplit[1] = word2 
mySentenceSplit[2] = word3 
mySentenceSplit[3] = <YYY>word4 word5 word6</YYY>
mySentenceSplit[4] = word7 
mySentenceSplit[5] = word8 
mySentenceSplit[6] = word9 
mySentenceSplit[7] = word10
mySentenceSplit[8] = <ZZZ>word11 word12</ZZZ>.
What do i have to insert into "someUnknownRegex" to achieve this ?
 
     
     
     
     
    