Why does pd.Series([np.nan]) | pd.Series([True]) evaluate to False?

Question

Why does the following code return False?

>>> pd.Series([np.nan]) | pd.Series([True])
0    False
dtype: bool

Looks like a bug, since the commutative yield `True`. Should open an issue in their github. — rafaelc, May 08 '20 at 21:01
This is interesting. Note, `np.nan or True` evaluates to `nan`, basically, `nan` will propagate in your operations. What is *super* weird is that *actually* `bool(np.nan)` will be `True`, and even more strangely, `pd.Series([np.nan],dtype=np.bool)` gives you a series with a single `True` — juanpa.arrivillaga, May 08 '20 at 21:01
@juanpa.arrivillaga To make the story more interesting, `pd.NA` (as opposed to `np.nan`) does not propagate. — rafaelc, May 08 '20 at 21:03
[Here](https://github.com/pandas-dev/pandas/issues/6528)'s a related discussion from pandas GitHub page. — ayhan, May 08 '20 at 21:23
Related thread here: https://stackoverflow.com/questions/37131462/comparing-logical-values-to-nan-in-pandas-numpy — Ji Wei, May 22 '20 at 03:30

Reuben · Answer 1 · 2020-05-25T14:48:15.183

4

I think this is because np.nan has metaclass of float and I guess overrides __bool__ to be non-zero:

np.nan.__bool__() == True

In the same way:

>>>np.nan or None
nan

A solution in pandas would be:

pd.Series([np.nan]).fillna(False) | pd.Series([True])

EDIT ***

For clarity, in pandas 0.24.1 in the method: _bool_method_SERIES on line 1816 of .../pandas/core/ops.py there is an assignment:

    fill_bool = lambda x: x.fillna(False).astype(bool)

which is where the behaviour you are describing is coming from. I.e. it's been purposefully designed so that np.nan is treated like a False value (whenever doing an or operation)

edited May 25 '20 at 14:48

answered May 25 '20 at 09:04

Reuben

68
7

*"...so that `np.nan` is treated like a `False` value (whenever doing an or operation)"* - **no**, `np.nan` is not treated as something different, try yourself `np.nan or True` and you will see that the result is `np.nan`. – MarianD May 25 '20 at 17:16
@MarianD - hey, I think I referenced that above; but my point is that `pandas` fills `np.nan` with `False` during `__or__` operations - hope that helps. – Reuben May 25 '20 at 21:18
**1.** Sorry, you referenced nothing (no links in your answer; BTW why version 0.24.1?). **2.** If — as you states — “padnas fill `np.nan` with `False`”, why `False or True` gives `False` (as in OP example)? – MarianD May 26 '20 at 07:55
BTW, you could be more specific directly in your answer. – MarianD May 26 '20 at 08:06

MarianD · Answer 2 · 2020-05-26T07:59:03.460

Compare your case (with the explicit dtype to emphasize the inferred one):

In[11]: pd.Series([np.nan], dtype=float) | pd.Series([True])

Out[11]: 
0    False
dtype: bool

with a similar one (only dtype is now bool):

In[12]: pd.Series([np.nan], dtype=bool) | pd.Series([True])

Out[12]: 
0    True
dtype: bool

Do you see the difference?

The explanation:

In the first case (yours), np.nan propagates itself in the logical operation or (under the hood)
```
In[13]: np.nan or True
Out[13]: nan
```
and pandas treated np.nan as False in the context of an boolean operation result.
In the second case the output is unambiguous, as the first series has a boolean value (True, as all non-zero values are considered True, including np.nan, but it doesn't matter in this case):
```
In[14]: pd.Series([np.nan], dtype=bool)
```
```
Out[14]: 
0    True
dtype: bool
```
and True or True gives True, of course:
```
In[15]: True or True
Out[15]: True
```

Why does pd.Series([np.nan]) | pd.Series([True]) evaluate to False?

2 Answers2