I am cleaning rss feed data that I pulled using feedparser. I managed to remove all special characters but I am unable to remove the "p" from the tag <p>. How can I remove this?
I tried this code:
def clean_text(text):
return [re.sub('[^a-z0-9]', '', w.lower()) for w in text.strip().split()]
news_df['clean_body'] = news_df['summary'].apply(clean_text)
It successfully executed this but the tag <p> is not fully removed because the p is remaining.