I have created a custom function to clean up a large text body through regular expressions in Python 3.7. I am using jupyter notebook 6.0.3
import numpy as np
import pandas as pd
import re
import string
def pre_process(arr):
    legal_chars = string.ascii_letters + string.punctuation + string.digits + string.whitespace + "äÄöÖüÜ"
    while "  " in arr: # removes unnecessary empty spaces
        arr = arr.replace("  ", " ")
    while "\n\n" in arr: # removes unnecessary new lines
        arr = arr.replace("\n\n", "\n") 
    for char in arr: # removes illegal charachters
        if char not in legal_chars:
            arr=arr.replace(char,"")    
    pattern4 = r"[\d]+\W[\d]+" # remove long numbers separated with non-digit
    pattern4_1 = r"[\d]+\W[\d]+"
    arr = re.sub(pattern4, '1', arr)
    arr = re.sub(pattern4_1, '', arr)
    pattern5 = r"\W[\d]+\W[\d]+\W" # remove long numbers enclosed by non-digit
    pattern6 = r"\W[\d]+\W"
    arr = re.sub(pattern5, '.', arr)
    arr = re.sub(pattern6, '', arr)
    pattern1 = r"\d{5,}" # remove long numbers
    arr = re.sub(pattern1, '', arr)
    return arr
When run on the respective column in my smaller testing dataframe directly with .apply - it returns me the expected results and the text is cleaned.
I need to however apply this to a much larger dataframe and wanted to try speeding things with the the multiprocessing package.
I used:
import multiprocessing as mp
with mp.Pool() as pool:
    df_t["Text"] = pool.map(pre_process,df_t["Text"])
I have used multiprocessing on the same dataframe with built in functions successfully, but when run with my custom function, nothing happens. Kernel just freezes. I tried with pool.apply() as well with no results.
Could it be a problem in my function or am I implementing multiprocessing in a wrong way?
I tried applying the suggestions here: multiprocessing.Pool: When to use apply, apply_async or map? but no change.
 
    