I have two csv files and I am trying to combine them in one based on a condition.
they both have the 'Import_ID' column which gives unique id for each row. What I need to do is to check if the Import_ID from df1 already exist in df2, if not add this row, else skip it. It would be best if it would be done without the loops or iterrows().
df1
 Import_ID            State                  ProductName
0     7027514279_100    716        TRANSCEND ESD220C 240GB SSD extern
1   7027514279_200.1    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
2   7027514279_200.2    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
3   7027514279_200.3    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
4   7027514279_200.4    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
and df2
   Import_ID          State                  ProductName
0     7027329459_100    716        BROTHER MFCJ6530DW MULTI-FUNCTION (AT)
1   7027514279_200.2    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
2   7027514279_200.3    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
3     7027329459_400    716        BROTHER LC-3219XLY Tinte Gelb (1500 S)
4     7027329459_500  20010        BROTHER LC-3219XLBK Tinte Schwarz
what I would like it to be is that df1 rows are only then imported when their Import_ID doesn't exist in the df2
df2 updated:
 Import_ID          State                  ProductName
0     7027329459_100    716        BROTHER MFCJ6530DW MULTI-FUNCTION (AT)
1   7027514279_200.2    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
2   7027514279_200.3    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
3     7027329459_400    716        BROTHER LC-3219XLY Tinte Gelb (1500 S)
4     7027329459_500  20010        BROTHER LC-3219XLBK Tinte Schwarz
5     7027514279_100    716        TRANSCEND ESD220C 240GB SSD extern
6   7027514279_200.1    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
7   7027514279_200.4    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
so in the example the df1 rows 1 nad 2 are not copied, because they already exist.
I tried by creating a temp file where I would store all of the rows from df1 that don't exist in df2 but it didn't work.
temp = numpy.where(file_in['Import_ID'] not in finalData['Import_ID'], file_in)
I don't know what else I should try , since I am pretty new to python.
 
    