I have a DataFrame that I would like to filter out "bad data" with a regex. In my use case any number in column_b that has 4 identical numbers in a row is considered "bad".
Here is my code:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A' : [np.NaN,np.NaN,3,4,5,5,3,1,5,np.NaN], 
                    'B' : [1111111,1234567,2222,55555,0,0,np.NaN,9,0,0], 
                    'E' : ['Assign','Unassign','Assign','Ugly','Appreciate','Undo','Assign','Unicycle','Assign','Unicorn',]})
print(df1)
bad_data = df1[df1['B'].astype(str).str.contains(r'(\d)\1{3,}')]
print(bad_data)
     A          B       E
0  NaN  1111111.0  Assign
2  3.0     2222.0  Assign
3  4.0    55555.0    Ugly
My code works. But I get this UserWarning: UserWarning: This pattern has match groups. To actually get the groups, use str.extract.
This was talked about here. Following that example.I changed my regex to use a noncapturing group (?...):
bad_data = df1[df1['B'].astype(str).str.contains(r'(?:(\d))\1{3,}')] 
But I still recieve the UserWarning. No matter where or how many non caputring groups i try. I could filter out the warning like in the other link. But is there something I am doing wrong/could be doing better that keeps the Warning from popping up
 
    