i'm trying to handle multiprocessing in python, however, i think i might did not understand it properly.
To start with, i have dataframe, which contains texts as string, on which i want to perform some regex. The code looks as follows:
import multiprocess 
from threading import Thread
def clean_qa():
    for index, row in data.iterrows():
        data["qa"].loc[index] = re.sub("(\-{5,}).{1,100}(\-{5,})|(\[.{1,50}\])|[^\w\s]", "",  str(data["qa"].loc[index]))
if __name__ == '__main__':
    threads = []
    
    for i in range(os.cpu_count()):
        threads.append(Thread(target=test_qa))
        
    for thread in threads:
        thread.start()
        
    for thread in threads:
        thread.join()
if __name__ == '__main__':
    processes = []
    for i in range(os.cpu_count()):
        processes.append(multiprocess.Process(target=test_qa))
        
    for process in processes:
        process.start()
        
    for process in processes:
        process.join()
    
When i run the function "clean_qa" not as function but simply by executing the for loop, everything works fine and it takes about 3 minutes.
However, when i use multiprocessing or threading, first of all, the execution takes about 10 minutes, and the text is not cleaned, so the dataframe is as before.
Therefore my question, what did i do wrong, why does it take longer and why does nothing happen to the dataframe?
Thank you very much!
 
    