I have 2 gzipped csv files IMFBOP2017_1.csv.gz and IMFBOP2017_2.csv.gz with same columns in both file i.e "Location, Indicator, Measure, Unit, Frequency, Date".
Total rows 60 millions+
I want to compare both file & display rows of IMFBOP2017_1 that are not present in IMFBOP2017_2.
My plan is to import both files to dataframes , add an extra column "compare" to both dataframes and update it by all fields merge like
Location|Indicator|Measure|Unit|Frequence|Date and do NOT IN operation.
I think this is a costly process, is there any simple solution for this?