I have the following two databases:
url='https://raw.githubusercontent.com/108michael/ms_thesis/master/rgdp_catcode.merge'
df=pd.read_csv(url, index_col=0)
df.head(1)
    naics   catcode                                        GeoName  Description     ComponentName   year    GDP     state
0   22  E1600',\t'E1620',\t'A4000',\t'E5000',\t'E3000'...   Alabama     Utilities   Real GDP by state   2004    5205    AL
url='https://raw.githubusercontent.com/108michael/ms_thesis/master/mpl.Bspons.merge'
df1=pd.read_csv(url, index_col=0)
df1.head(1)    
    state   year    unemployment    log_diff_unemployment   id.thomas   party   type    date    bills   id.fec  years_exp   session     name    disposition     catcode
0   AK  2006    6.6     -0.044452   1440    Republican  sen     2006-05-01  s2686-109   S2AK00010   39  109     National Cable & Telecommunications Association     support     C4500
Regarding df, I had to manually input the catcode values. I think that is why the formatting is off. What I would like is to simply have the values without the \t prefix. I want to merge the dfs on catcode, state, year. I made a test earlier wherein a df1.catcode with only one value per cell was matched with the values in another df.catcode that had more than one value per cell and it worked.
So technically, all I need to do is lose the \t before each consecutive value in df.catcode, but additionally, if anyone has ever done a merge of this sort before, any 'caveats' learned through experience would be appreciated. My merge code looks like this:
mplmerge=pd.merge(df1,df, on=(['catcode', 'state', 'year']), how='left' )
I think this can be done with the regex method, I'm looking at the documentation now.
 
     
    