I have some source where I am trying to remove some tags, I do know that using regular expression for removing tags and such is not advised but figured this would be the easiest route to take.
What I need to do is remove all img and a tags along with the contents of the a tags that are only inside a p tag but I am unsure how to do this using regular expression.
For example when it comes across:
<p><img src="center.jpg"><a href="?center">center</a>TEXT<img src="right.jpg"><a href="?rightspan">right</a> MORE TEXT<img src="another.jpg"></p>
The output should be the following where all a tags and content and img tags are removed.
<p>TEXT MORE TEXT</p>
The problem is like I stated i'm not sure how to do this, and my regular expression removes all of the a and img tags in the source, not just the ones inside of a p tag.
re.sub(r'<(img|a).*?>|</a>', '', text)