I have the following pandas series:
arr = pd.Series(['C', 'A', 'T', 'G', 'CC', 'KEEP', 'ATC', 'CACACAC', 'CCCCCCCCACAGTTTATGTAG', 'C(2', 'Cor CC', 'AC or ACC'])
From it, I want to remove the elements C(2, Cor CC and AC or ACC using regex
So the criteria that I am trying to match are:
- Start with a capital letter:
^[A-Z] - Exclude any element that has a parenthesis in it:
[^\(] - Exclude any element that has the string
or
arr.str.contains(r'^[A-Z][\(]') will match C(2 whereas I can match Cor CC and AC or ACC with arr.str.contains(r'\w*or.\w*'.
I can then pop out these elements from my list, but I am trying to keep the elements of interest (i.e. without C(2, Cor CC and AC or ACC) using regular expression