I am parsing some well-organized strings(HTML format) to extract data. The Format is like(newline added for reading convinience):
<span><h2>Category 1</h2>
<p><strong><u>Entry 1</u></strong></p>
<ul><li>Some Data</li></ul>
<h2>Category 2</h2>
<p><strong><u>Entry 2</span>
<ul><li>Some Data</li></ul>
</span>
I intend to find all strings between <h2> and extract strings after </h2> first. The searching pattern is /<h2>Tier.*?<\/h2>(.*?)(<h2>|<\/span>)/g. But each matching substring is exactly ending with <h2>. So the next category will not be extracted, while the third category block is fine because there is a new searching.
Then I try to search for substrings which not contains <g2> greedily. The pattern is h2>Category.*?<\/h2>(^(h2).)*. It does not work though.