I have observations disappear after I merge two dataframes.
I have two dataframes that look like this:
df_1 
       text      user
bla bla bla     user1
ga ga ga ga     user1
bur bur bur     user2
     .            .
df_2
       user       url
      user1   asd.com
      user2   dsa.com
     .            .  
I use the unique list of users from the first one to web scrape data on them and construct the second one. I would like to merge them so they look like this:
df_merged
       text      user       url
bla bla bla     user1   asd.com
ga ga ga ga     user1   asd.com
bur bur bur     user2   dsa.com
     .            .        .
I merge them by using:
df_merged = df_1.merge(df_2, on = 'user', validate = "m:m")
The problem is that after the merge observations disappear randomly, for example:
len(df_1['user'].drop_duplicates())
returns 11115
len(df_2['user'])
returns 11115
len(df_merged['user'].drop_duplicates())
returns 7076
df_1 contains about 70k observations, while df_merged contains about 30k
Does anyone know what's going on?
 
    