I am trying to find tag article and all it's content in HTML string using Regex.
I can successfully match open tag with attrs: <article[^>]*>
I've got issues with matching contents. (.*?) - this technique is not working for me.
Please help.
I am trying to find tag article and all it's content in HTML string using Regex.
I can successfully match open tag with attrs: <article[^>]*>
I've got issues with matching contents. (.*?) - this technique is not working for me.
Please help.
You cannot use regular expressions to parse HTML in general. However, in constrained scenarios (i.e. when the input follows a rigid structure), you might be able to get away with it. In your case, you can use the following regex, provided that:
<article> tags are not self-closing<article> elements do not contain other <article> descendants<article and </article> do not appear as literals in your HTML.Code:
var matches = Regex.Matches(html, @"<article.*?</article>", RegexOptions.Singleline);