I have a function called postprocess that applies while loop condition to find for - and alphabets to each dataframe row. postprocess looks like this:
def postprocess(description, start_index, end_index):
if (start_index > 0) & (start_index < len(description)):
while bool(re.match(r"\w|\'|-", description[start_index - 1])) & bool(
re.match(r"\w|\'|-", description[start_index])
):
start_index = start_index - 1
if new_start == 0:
break
description = description[new_start:new_end]
return description
For example the description is credit payment velvet-burger and the start_index is 7 and end_index is 12. So description[start_index] will be b Which is the b in burger will be run in a while loop by tracing backwards to return the target substring we want to see because burger is not complete as we want the word velvet- also.
After running postprocess we will get velvet-burger.
The complete code looks like this:
df["target_substring"] = df.apply(lambda x: postprocess(
x["description"], x["start_index"], x["end_index"]+1),
axis=1)
Is there a better way to write this code?