Faster way to find all columns are with no missing values?

Question

Currently I am using this statement to find all columns in a dataframe that has no missing values, it works fine. but I'm wondering if there is more concise way (albeit, efficient way) to do the same thing?

df.columns[ np.sum(df.isnull()) == 0 ]

provide sample dataframe and desired output as well – eshirvana Sep 19 '22 at 20:32 — eshirvana, Sep 19 '22 at 20:32
The expected outputs will be just those columns names only. – Daniel Hao Sep 19 '22 at 20:42 — Daniel Hao, Sep 19 '22 at 20:42

Echo · Answer 1 · 2022-09-19T20:51:55.640

1

You can use this:

df.isna().any() # returns all columns either True (column names that has MISSING values) False (column names has NO MISSING values)

df.columns[df.isna().any()]  # returns only the column names with MISSING values

df.columns[~df.isna().any()] # tilda negates the condition # returns the columns with NO MISSING values

df.columns[~df.isna().any()].tolist() # .tolist() converts the result to a list, if you wish.

edited Sep 19 '22 at 20:51

answered Sep 19 '22 at 20:33

Echo

293
2
10

@Daniel Hao you can add this piece of code at the end, if you wish to convert the results to a list `.tolist()` – Echo Sep 19 '22 at 20:43
Yeah, this is another way to achieve it. +1 (upvote it) – Daniel Hao Sep 19 '22 at 20:44
Glad it worked for you, i was puzzled how it did not work for you the first time – Echo Sep 19 '22 at 20:47
The explanation could be more precise - #2 case. "Only the columns *without* any missing values...." – Daniel Hao Sep 19 '22 at 20:50
You still miss it - *without* not *with* – Daniel Hao Sep 19 '22 at 20:52
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/248181/discussion-between-daniel-hao-and-echo). – Daniel Hao Sep 19 '22 at 20:55

Gonçalo Peres · Accepted Answer · 2022-09-19T21:00:33.110

To better answer the question one would need to have access to the dataframe in question.

Without it, there are various method one can use.

Let's consider the following dataframe as example

df = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list('ABCD'))
df.iloc[0:10, 0] = np.nan

[Out]:
    A   B   C   D
0 NaN  89  63  41
1 NaN  12  47   8
2 NaN  79  76  67
3 NaN  87  61  38
4 NaN  28  31  30

Method 1 - As OP indicated (we will be use as reference)
```
df.columns[ np.sum(df.isnull()) == 0 ]
```
Method 2 - Similar to Method 1, with numpy.sum and pandas.isnull, but with a Lambda function
```
df.columns[ df.apply(lambda x: np.sum(x.isnull()) == 0) ]
```

Method 3 - Using numpy.all and pandas.DataFrame.notnull

columns = df.columns[ np.all(df.notnull(), axis=0) ]

Method 4 - Using only pandas built-in modules

columns = df.columns[ df.isnull().sum() == 0 ]

Method 5 - Using pandas.DataFrame.isna (same method used here).
```
columns = df.columns[ df.isna().any() == False ]
```

The output in all is the one that OP wants, more specifically

Index(['B', 'C', 'D'], dtype='object')

If one times each of the methods with time.perf_counter() (there are additional ways to measure the time of execution), one will get the following

     method          time
0  method 1  2.999996e-07
1  method 2  3.000005e-07
2  method 3  2.000006e-07
3  method 4  6.000000e-07
4  method 5  3.999994e-07

Again, this might change depending on the dataframe that one uses. Also, depending on the requirements (hardware, and business requirements), there might be other ways to achieve the same goal.

Wow! Thanks for the detail analysis... So it seems the original approach is good enough. — Daniel Hao, Sep 19 '22 at 20:54
@DanielHao there are even more ways to achieve what you are looking for. The best results would depend on the type of data at hand, so, to give a better answer, one would need access to the dataframe. — Gonçalo Peres, Sep 19 '22 at 20:56

Faster way to find all columns are with no missing values?

2 Answers2