First, get rid of any leading or trailing space:
.trim()
Then get rid of HTML entities (&...;):
.replaceAll("&.*?;", "")
& and ; are literal chars in Regex, and .*? is the non-greedy version of "any character, any number of times".
Next get rid of tags and their contents:
.replaceAll("<(.*?)>.*?</\\1>", "")
< and > will be taken literally again, .*? is explained above, (...) defined a capturing group, and \\1 references that group.
And finally, split on any sequence of non-letters:
.split("[^a-zA-Z]+")
[a-zA-Z] means all characters from a to z and A to Z, ^ inverts the match, and + means "once or more".
So everything together would be:
String words = str.trim().replaceAll("&.*?;", "").replaceAll("<(.*?)>.*?</\\1>", "").split("[^a-zA-Z]+");
Note that this doesn't handle self-closing tags like <img src="a.png" />.
Also note that if you need full HTML parsing, you should think about letting a real engine parse it, as parsing HTML with Regex is a bad idea.