For my question, I have found quite a few entries that explain how to drop rows with specific column values; however, I've not been able to find (I know a post might be out there) a post that addresses how to drop rows in a dataframe with specific column values across multiple columns (34 in this case).
- How to drop rows of Pandas DataFrame whose value in certain columns is NaN
- Drop Rows by Multiple Column Criteria in DataFrame
- Drop rows in pandas dataframe based on columns value
baddata
zip        age    item1    item2    item3    item4    item5    item6    item7    item34
12345       10    1        0        1        1        0         0       1           0
23456       20   10       111       11       1        0         1       9           8
45678       60    1        0         1       1        0         1       0           1
I want to retain all those rows that has values of '1' or '0' (drop all rows for which col values in 34 cols are not '1' or '0'). This is what I tried so far:
baddata = pd.DataFrame(data=dirtydata, columns=['zip','age','item1','item2'...'item34'])
gooddata=baddata.dropna() # some rows have NaN; drops rows with NaN values
option-1:
gooddata[gooddata[['item1','item2'...'item34']].isin([0,1])] #this makes values for zip and age NaN; not sure why?
option-2:
gooddata[gooddata[['item1','item2'...'item34']].map(len) < 2).any(axis=1)] #also tried replacing 'any' with 'all'; did not work
option-3:
cols_of_interest=['item1','item2'...'item34'] gooddata[gooddata.drop(gooddata[cols_of_interest].map(len) < 2)] #doubtful about the syntax and usage of functions
 
     
     
    