On basis of column Articlenbr and amount need to check duplicates and extract those duplicates in another dataframe. Ex in below example i want to extract 1st two rows ,save it in another dataframe and delete from original dataframe. How can be done in pyspark.
            Asked
            
        
        
            Active
            
        
            Viewed 34 times
        
    0
            
            
        - 
                    Does this answer your question? [Remove pandas rows with duplicate indices](https://stackoverflow.com/questions/13035764/remove-pandas-rows-with-duplicate-indices). or https://stackoverflow.com/questions/14657241/how-do-i-get-a-list-of-all-the-duplicate-items-using-pandas-in-python – Emma Nov 18 '22 at 16:52
1 Answers
0
            
            
        Try this:
dups = df.groupby('Articlenbr').count()
dups = dups[dups['amount']>1].index.values
df[df['Articlenbr'].isin(dups)]
 
    
    
        gtomer
        
- 5,643
- 1
- 10
- 21



