I have a dataframe of companies, called slave <class 'pandas.core.frame.DataFrame'>:
Date    ID  Name    Company     Email
2018    1   AB      Apple       apple@apple.com
2017    2   BC      MicroSoft   ms@ms.com
2016    3   CD      Amazon      amz@amz.com
the Company column is an object type, not string.
and another big dataframe (2GB) of master data <class 'pandas.core.frame.DataFrame'>:
code    company_name    product
123     MicroSoft       Windows
456     Apple           iphone
789     Amazon          cloud
The type company_name is object, same object type as the Company, not string.
I want to look up every value in Company value in the company_name of the second DF and append the match to a series:
def finder(slave, master):
    finalList = []
    for company in slave['Company']:
        if type(company) == 'some_specific_type':
            for info in master['company_name']:
                if company in info:
                    finalList.append(master.loc[str(info)]['code'])
                    break
        else:
            finalList.append(company)
    return finalList
how can the look up process be much faster for this kind of situation? the master list is 20m lines and current loop is extremely slow.
slave['newCode'] = finder(slave, master) // very slow
the result will be:
Date    ID  Name    Company     Email               newCode
2018    1   AB      Apple       apple@apple.com     456
2017    2   BC      MicroSoft   ms@ms.com           123
2016    3   CD      Amazon      amz@amz.com         789
i am checking for the type of the value, it is not a simple merge operation. The slave and master DF are not of the same size of rows, master is 20m rows, whereas the slave is just 1K rows. and i want to compare column company_name, but get the corresponding row value from another column code and store it as a new column in the slave table.
My goal: how can I use NumPy instead, using vectorization with NumPy arrays?
