I have the following code that gets the a href tags urls from an XML which is working correctly:
Pattern p = Pattern.compile("<a[^>]+href\\s*=\\s*['\"]([^'\"]+)['\"][^>]*>");
Matcher m = p.matcher(xmlString);
while (m.find())
imagesURLs.add(m.group(1));
I have the following:
<a href="http://...">some text</a>
The top code gets me <a href="http://..."> in m.group(0) and http://... in m.group(1).
I also want to get the full <a href="http://...">some text</a>.
How can achieve this by modifying the regex?