Using re.findall(), I'm attempting to find all occurrences of each term from a list of terms, in a string.
If a particular term contains special characters (i.e. a '+'), a match will not be found, or error messages may be generated. Using re.escape(), the error messages are avoided, but the terms with special characters are not found within the string.
import re
my_list = ['java', 'c++', 'c#', '.net']
my_string = ' python javascript c++ c++ c# .net java .net'
matches = []
for term in my_list:
if any(x in term for x in ['+', '#', '.']):
term = re.escape(term)
print "\nlooking for term '%s'" % term
match = re.findall("\\b" + term + "\\b", my_string, flags = re.IGNORECASE)
matches.append(match)
The above code will only find 'java' within the string. Any suggestions regarding, how to find terms with special characters within the string?
Caveat: I cannot change 'my_list' manually, because I don't know in advance what terms it will contain.
Update - it appears that the problem has to do with the word boundary specifiers within the regex (the "\b") breaking up the string along characters which include the non-alphanumeric chars included in the string. It's unclear how to solve this in a clean and straightforward way, however.
Edit - this question is not a duplicate of this - it already incorporates the most applicable solution from that post.