I have this piece of HTML code as a string stored in a variable.
<p style="text-align: center;">
    <span style="font-size: small;font-family: comic sans ms', sans-serif;">
        <strong>
            word1 
            <span style="line-height: 1.5;">
                word2 
            </span>
            <span style="line-height: 1.5;">
                word3 
            </span>
            <span style="line-height:1.5;"></span>
        </strong>
    </span>
</p>
I want only to extract word1 , word2  and word3 . How can I do it in an easiest and time efficient way?
I was thinking the character > that was not preceded immediately by < can be a index where I can start extracting my data.