Why does the following code return False?
>>> pd.Series([np.nan]) | pd.Series([True])
0 False
dtype: bool
Why does the following code return False?
>>> pd.Series([np.nan]) | pd.Series([True])
0 False
dtype: bool
I think this is because np.nan has metaclass of float and I guess overrides __bool__ to be non-zero:
np.nan.__bool__() == True
In the same way:
>>>np.nan or None
nan
A solution in pandas would be:
pd.Series([np.nan]).fillna(False) | pd.Series([True])
EDIT ***
For clarity, in pandas 0.24.1 in the method: _bool_method_SERIES on line 1816 of .../pandas/core/ops.py there is an assignment:
fill_bool = lambda x: x.fillna(False).astype(bool)
which is where the behaviour you are describing is coming from. I.e. it's been purposefully designed so that np.nan is treated like a False value (whenever doing an or operation)
Compare your case (with the explicit dtype to emphasize the inferred one):
In[11]: pd.Series([np.nan], dtype=float) | pd.Series([True])
Out[11]: 0 False dtype: bool
with a similar one (only dtype is now bool):
In[12]: pd.Series([np.nan], dtype=bool) | pd.Series([True])
Out[12]: 0 True dtype: bool
Do you see the difference?
The explanation:
In the first case (yours), np.nan propagates itself in the logical operation or (under the hood)
In[13]: np.nan or True
Out[13]: nan
and pandas treated np.nan as False in the context of an boolean operation result.
In the second case the output is unambiguous, as the first series has a boolean value (True, as all non-zero values are considered True, including np.nan, but it doesn't matter in this case):
In[14]: pd.Series([np.nan], dtype=bool)
Out[14]: 0 True dtype: bool
and True or True gives True, of course:
In[15]: True or True
Out[15]: True