I have 2 datasets (in CSV format) with different size such as follow:
df_old:
index category  text
 0    spam      you win much money
 1    spam      you are the winner of the game
 2    not_spam  the weather in Chicago is nice
 3    not_spam  pizza is an Italian food
 4    neutral   we have a party now
 5    neutral   they are driving to downtown
 
df_new:
index category  text
 0    spam      you win much money
 14   spam      London is the capital of Canada
 15   not_spam  no more raining in winter
 25   not_spam  the soccer game plays on HBO
 4    neutral   we have a party now
 31   neutral   construction will be done
 
I am using a code that concatenates the df_new to the df_old in the way that df_new goes on top of df_old's each category.
The code is:
(pd.concat([df_new,df_old], sort=False).sort_values('category', ascending=False, kind='mergesort')) 
Now, the problem is that some of the rows with similar index, category, text (all together at same row) being duplicated at the same time, and (like: [0, spam, you win much money]) I want to avoid this.
The expected output should be:
df_concat:
index category  text
 14   spam      London is the capital of Canada
 0    spam      you win much money
 1    spam      you are the winner of the game
 15   not_spam  no more raining in winter
 25   not_spam  the soccer game plays on HBO
 2    not_spam  the weather in Chicago is nice
 3    not_spam  pizza is an Italian food
 31   neutral   construction will be done
 4    neutral   we have a party now
 5    neutral   they are driving to downtown    
I tried this and this but these are removing either the category or text.
 
     
     
     
    