I have a sentence:
'hi how <unk> are you'
I need to remove <unk> from it.
Here is my code:
re.sub(r'\b{}\b'.format('<unk>'), '', 'agent transcript str <unk> with chunks for key phrases')
Why doesn't my RegEx work for <...>?
I have a sentence:
'hi how <unk> are you'
I need to remove <unk> from it.
Here is my code:
re.sub(r'\b{}\b'.format('<unk>'), '', 'agent transcript str <unk> with chunks for key phrases')
Why doesn't my RegEx work for <...>?
There is no word boundary between a space an < or >, you could instead try
re.sub(r'(\s*)<unk>(\s*)', r'\1\2', your_string)
Or - if you don't want two spaces, you may try
re.sub(r'(\s*)<unk>\s+', r'\1', your_string)
\b is a word boundary between a non-word character ([^\w+]+) and a word character (\w+ or [A-Za-z0-9_]). In your original string, you were trying to find a boundary between a space and a < or > where \b is not matching.