Comparison on None works element-wise but not in dataframe

Question

I am trying to see if any values change in my dataframe, from row to row. One column in particular has a lot of None values.
If I check for inequality between None values, element-wise, it gives the answer I expect (that is, None != None returns False), but not when I do that same comparison on the column.

Here's a minimal working example:

>>> example_data = [None]*3 + ['a','b','b'] + [None]*4
>>> df = pd.DataFrame(example_data, columns = ['col1'])
>>> print df

    col1
0   None
1   None
2   None
3   a
4   b
5   b
6   None
7   None
8   None
9   None

To see if the value changed from one line to the next, I'm shifting everything and comparing the values:

>>> did_it_change = (df != df.shift(1))
>>> print did_it_change

    col1
0   True
1   True
2   True
3   True
4   True
5   False
6   True
7   True
8   True
9   True

So long as None is not involved in the comparison, it's accurate (e.g. row 5 changed the value to b from its previous value of b, so it did not change).

If I check a single element, it works as expected:

>>> df.loc[1,'col1'] != df.loc[2,'col1']
False

But that's not the answer that appears in the full column: did_it_change.loc[1,:] is True.

I do get the expected outcome if I use "" instead of None, but replacing None with empty string like in this question seems arbitrary and superfluous.

This is the outcome I'd expect.

>>> example_data = [""]*3 + ['a','b','b'] + [""]*4
>>> df = pd.DataFrame(example_data, columns = ['col1']) 
>>> did_it_change = (df != df.shift(1))
>>> print did_it_change

    col1
0   True
1   False
2   False
3   True
4   True
5   False
6   True
7   False
8   False
9   False

What's going on here? Why does the comparison with None give different answers if I'm calling it on a dataframe column instead of its elements?

I suspect that `None` is getting converted to `np.nan` underneath the hood when doing DataFrame level operations, which would explain the output as `np.nan != np.nan` returns `True` by definition. See the output of `df == df`. — root, Mar 21 '17 at 20:44
Good theory! I just ran `df == df` and sure enough, the `None` values are not equal to themselves. At least now I know why it's happening. I still think this is undesired behavior... — Aturen, Mar 21 '17 at 20:55

Comparison on None works element-wise but not in dataframe

0 Answers0