I have a problem. I have a text which is a freetext. And a regex should regnoize element what is a pattern. Unfortunately for some elements there are abbrevation. So thats why I generated a abbrevation dict. Is there an option to also loop through the dict. If the element is inside the dict? That the abbrevation ca also does match.
Dataframe
customerId text element code
0 1 Something with Cat cat 0
1 3 That is a huge dog dog 1
2 3 Hello agian mouse 2
3 3 This is a ca cat 0
Code
import pandas as pd
import copy
import re
d = {
"customerId": [1, 3, 3, 3],
"text": ["Something with Cat", "That is a huge dog", "Hello agian", 'This is a ca'],
"element": ['cat', 'dog', 'mouse', 'cat'],
"code": [9,8,7, 9]
}
df = pd.DataFrame(data=d)
df['code'] = df['element'].astype('category').cat.codes
print(df)
abbreviation = {
"cat": {
"abbrev1": "ca",
},
}
%%time
elements = df['element'].unique()
def f(x):
match = 999
for element in elements:
elements2 = [element]
y = bool(re.search(element, x['text'], re.IGNORECASE))
#^ here
if(y):
#print(forwarder)
match = x['code']
#match = True
break
x['test'] = match
return x
df['test'] = None
df = df.apply(lambda x: f(x), axis = 1)
What I have
customerId text element code test
0 1 Something with Cat cat 0 0
1 3 That is a huge dog dog 1 1
2 3 Hello agian mouse 2 999
3 3 This is a ca cat 0 999
What I want
customerId text element code test
0 1 Something with Cat cat 0 0
1 3 That is a huge dog dog 1 1
2 3 Hello agian mouse 2 999
3 3 This is a ca cat 0 0