I have two dataframes that I want to merge using the 'PH' Column in brandmap data and 'product_hierarchy' in the temp data. The 'PH' column have substrings of length (4,7,11,and 15) of the strings in 'product_hierarchy'. How do I merge these two data frames using these columns and a substring match?
            Asked
            
        
        
            Active
            
        
            Viewed 983 times
        
    1 Answers
0
            
            
        A little bit complicated but it will work in combination with str.extract
import pandas as pd
df_ref = pd.DataFrame({"PH":["XXST", "XX7T"], "ValA": [1,2], "ValB": ["foo","bar"]})
df = pd.DataFrame({"product_hierarchy":["XXSTSDASD", "XX7TDSADASDASD", "XXSTHD", "XX7TDFDF"], 
                   "Val":["foo", "bar", "baz", "bar"]})
str_match = "({})".format("|".join(df_ref.PH))
df.merge(df_ref, left_on=df.product_hierarchy.str.extract(str_match)[0], right_on="PH")
Output:
    product_hierarchy   Val     PH   ValA   ValB
0   XXSTSDASD           foo     XXST    1   foo
1   XXSTHD              baz     XXST    1   foo
2   XX7TDSADASDASD      bar     XX7T    2   bar
3   XX7TDFDF            bar     XX7T    2   bar
 
    
    
        Fourier
        
- 2,795
- 3
- 25
- 39
- 
                    I have more than 62,000 rows in both dataframes so when I run this similar code I get a memory error. Any idea of how to fix that? – vraka0723 Jan 22 '20 at 19:56
- 
                    @vraka0723 You can do it in chunks,https://stackoverflow.com/questions/25962114/how-to-read-a-6-gb-csv-file-with-pandas – Fourier Jan 23 '20 at 08:39
