$pattern='`<a\s+[^>]*(href=([\'\"]).*\\2)[^>]*>([^<]*)</a>`isU';
And I want to change ([^<]*) this to search for </a> not only < cause <img> tag could be inside <a> tag.
Can anyone help, I'm lousy at regex.
$pattern='`<a\s+[^>]*(href=([\'\"]).*\\2)[^>]*>([^<]*)</a>`isU';
And I want to change ([^<]*) this to search for </a> not only < cause <img> tag could be inside <a> tag.
Can anyone help, I'm lousy at regex.
You can use a PHP parser to do this. I wouldn't use Regex at all.
You can try: http://simplehtmldom.sourceforge.net/
Although I think PHP has a DOM parser built in.
Changing ([^<]*)to a ungreedy match all (.*?) might do the trick
([^<]*) could be changed to ((?:[^<]|<(?!/a>))*), which uses a negative lookahead to match non-< characters or < characters which are not followed by /a>. See it in action here.
HOWEVER, as stated many times over already, this is not a good way to parse HTML. Firstly, it's horribly inefficient, and secondly, what happens if you have nested tags, such as <a><a></a></a>? While this may not happen with hyperlinks, it's common among many other HTML elements.